Orphaned Post Meta and Dead Revisions

I’m adding some occasional housekeeping to sites, and looking for facts and opinions.

Post Metadata should be associated with posts – but quite often it gets unlinked – orphaned. WordPress notoriously accumulates these, and after years of working with WP, I only found out about this recently.
It looks like some orphaned post meta records in my sites are related to Smush.

To see it, load the WP-Sweep plugin, then from Tools>Sweep look at the count of Orphaned Post Meta. Click “Details” to see the list.

To DEV :

1) Please advise (blog?) on whether doing a mass delete of orphaned post meta is something that we can do without concern, maybe after hours, or if you believe this should be approached more carefully. The WP-Sweep plugin doesn’t offer selection by date. I’m wondering if we can/should remove records with a modification date older than x days?

2) Please check your code to eliminate accumulation of orphaned post meta.

3) Please consider a Branda Option to do housekeeping to remove these appropriately.

Revisions

Similarly WP can accumulate post revisions. Check WP-Sweep to see yours. Standard handling of revisions is to limit it to some count per-post or to disable it entirely.

I would prefer to delete revisions that have a modification date older than x days. This can easily be done with SQL and scheduled with cron, but …

To DEV :

4) What do you think of adding this kind of housekeeping to Branda?

5) Simply deleting post-type ‘revision’ is a commonly published remedy but I’m concerned about creating post-meta orphans and leaving other relational anomalies. Can someone there comment on whether this is actually the right way to clear revisions? (While I spend time in core code, I haven’t looked at the code that does this yet.)

Bottom line : Let’s try to end this thread with a definitive statement on exactly how to cleanly purge old revisions and orphaned post_meta.

Thanks! ( OK OK – so this is really the bottom line :grin: )

  • Patrick Freitas
    • FLS

    Hi Tony G

    I hope you are doing well.

    I only found out about this recently.
    It looks like some orphaned post meta records in my sites are related to Smush.

    Are those from the screenshot? If so, seems related to Hummingbird, but could be related to https://wpmudev.com/docs/wpmu-dev-plugins/hummingbird/#orphaned-asset-optimization-metadata

    1) Please advise (blog?) on whether doing a mass delete of orphaned post meta is something that we can do without concern, maybe after hours, or if you believe this should be approached more carefully. The WP-Sweep plugin doesn’t offer selection by date. I’m wondering if we can/should remove records with a modification date older than x days?

    This is a good idea for an article, we do have a similar one for https://wpmudev.com/blog/optimizing-your-wordpress-database-a-complete-guide/ but I believe we can extend it or update the information.

    About a safe period, to be honest I only worry about autoload, revisions and transients, for revision I just schedule Hummingbird once per week https://wpmudev.com/docs/wpmu-dev-plugins/hummingbird/#database-cleanup and for autoload I recommend https://supervisorwp.com/

    Have you faced any case where cleaning the orphaned post meta brought a huge difference in terms of performance?

    2) Please check your code to eliminate accumulation of orphaned post meta.

    3) Please consider a Branda Option to do housekeeping to remove these appropriately.

    4) What do you think of adding this kind of housekeeping to Branda?

    5) Simply deleting post-type ‘revision’ is a commonly published remedy but I’m concerned about creating post-meta orphans and leaving other relational anomalies. Can someone there comment on whether this is actually the right way to clear revisions? (While I spend time in core code, I haven’t looked at the code that does this yet.)

    I believe it would be more of a Hummingbird feature as we already have some database optimization features there.

    For revisions, it should be fine to clear it.

    I do suggest using Hummingbird with scheduled action, I don’t remember facing any anomalies, indeed, I did a look using WP-Sweep on one of my sites, and I had the same situation where WP-Sweep reported “Orphaned Post Meta” > 35 on count column, then I checked Hummingbird > Plugin health and could confirm it had the same 35 reports, so all my orphaned data is from Hummingbird which I don’t need to worry.

    Best Regards
    Patrick Freitas

  • Tony G
    • Mr. LetsFixTheWorld

    Patrick Freitas (my hero) there are a number of things to unpack here.

    1)
    I just opened the Hummingbird Dashboard and found this helpful notice:

    [attachments are only viewable by logged-in members]

    I thought something like this would appear on the WP admin dashboard, but OK, I’ll check plugin dashboards more often.

    2)
    I’m guessing the cleanup there is only for metadata specific to Hummingbird? It would be nice if that cleaned up anything related to WPMU DEV plugins.

    3)

    > About a safe period, to be honest I only worry about autoload, revisions and transients, for revision I just schedule Hummingbird once per week

    What I don’t like about automating a revision delete with Hummingbird and all other plugins that I’ve seen that do this kind of housekeeping, is that the selection for revisions is only based on record type, not time. Someone might be on at the time this cleanup is run and lose all of their backups. Some posts take days to develop. I would much prefer a selection date be added to the query:

    WHERE ‘post_type’= “revision” AND ‘post_modified’ < now() – interval 1 week;

    (EDIT: Ugh, this forum still doesn’t process backticks properly!)

    For exactly the same reason, I don’t feel comfortable simply deleting all transients. One or more might be actively in use by an online client, and required by a plugin, or a performance helper for someone coming back in a few hours. I would really prefer that the selection for transients select anything that’s older than 2 days, or a week.

    Similarly for autoloads – developers put them there for performance and it seems like an anti-pattern to delete them without judicious review.

    SELECT LENGTH(option_value) as autoload_size, ‘option_name’, ‘option_value’ FROM ‘wp_options’ WHERE autoload=’yes’;

    On a small new site I have about 500k of autoloads in 383 records, many of which are from core and the others are from plugins. Without metadata we have no idea how old these are, which ones are valid, or whether we’re actually decreasing performance by deleting them and forcing more DB queries.

    4)
    With all of that said, however, a scheduled cleanup, maybe once per week on a Sunday night, does not seem unreasonable – it’s just sub-optimal as the processes are right now.

    I’m even tickled that Hummingbird has these features and I don’t feel a need to load Sweep or any other plugin for this. Go Team! :tada:

    5) I had no idea that Hummingbird had this cleanup feature. I dunno if I forgot (age?) or if I missed this in recent announcements. I don’t see a cleanup option in any of the Asset Optimization settings.

    6)
    I do see that we can schedule optimization through Notifications. OK, I need to comment on this. The pattern has already been established that “Notifications” actually means “Do something … and then generate a Report”. That’s not good UX. Operations and Reporting should be separate. I didn’t think to look under Notifications to schedule a database operation, though in hindsight I guess this isn’t unusual with DEV plugins.

    I really think that Notifications should be “Scheduling”, that Performance Tests and Database Cleanup should be on that page as “Scheduled Optimizations”, and that “Notifications” should only contain settings for, um, notifications and reporting. Or move the scheduling from Notifications up to Asset Optimization where it makes more sense.

    I know none of this will change – I’m just saying, I didn’t think to look in Notifications for more Asset Optimization options.

    —-

    We might have exhausted this topic – I don’t have anything to add right now. But it would be nice if one of the DEV Pros could look more closely at this stuff and blog on it. I think it’s a worthwhile area of inquiry which is not addressed as cleanly across the industry as so many other concepts.

    Thanks.

  • Patrick Freitas
    • FLS

    Hi Tony G

    I thought something like this would appear on the WP admin dashboard, but OK, I’ll check plugin dashboards more often.

    Not all the time, just when the plugin notices the cleanup is necessary, as by default it should clear automatically:

    Data for optimized assets is stored in hidden custom post types with multiple metadata entries in the wp_postmeta table. While clearing Asset Optimization cache should delete both the hidden post types and associated metadata, metadata can sometimes remain and bloat the database

    I’m guessing the cleanup there is only for metadata specific to Hummingbird? It would be nice if that cleaned up anything related to WPMU DEV plugins.

    Yes, just for the plugin, we can forward this ticket as a feature request to extend the database cleanup module and include deeper cleanup in this matter.

    For exactly the same reason, I don’t feel comfortable simply deleting all transients. One or more might be actively in use by an online client, and required by a plugin, or a performance helper for someone coming back in a few hours. I would really prefer that the selection for transients select anything that’s older than 2 days, or a week.

    That makes sense, my website is a simple site with no content being added, I wouldn’t like to have a customer adding articles and losing important content.

    I am not sure if this can be considered as you also said, most of plugins will follow the cron job approach but I also forwarded this along with the metadata cleanup request.

    5) I had no idea that Hummingbird had this cleanup feature. I dunno if I forgot (age?) or if I missed this in recent announcements. I don’t see a cleanup option in any of the Asset Optimization settings.

    If I recall correctly we didn’t implement any modal, but have been there for a while, but the flow of assets optimization is to clear the hidden posts and remove the old asset when re-checking or detecting a file update.

    I know none of this will change – I’m just saying, I didn’t think to look in Notifications for more Asset Optimization options.

    We are always looking to hear the feedbacks.

    Before we had the notifications split across the plugins module, so in each module you would find a notification, from a point we moved everything to a notification center as it would make more sense to keep things in the same place, but it doesn’t mean we can’t modify this in future.

    Best Regards
    Patrick Freitas

  • Tony G
    • Mr. LetsFixTheWorld

    Thanks as always. I’m just confirming enhancement requests that we’ve resolved from this exercise :

    – Add a date option on revision removal. To avoid the expense of a UI change, I’d be happy with a simple config “define” for “number of days old” for the SQL selection: “interval N days”.

    – Attempt to remove transients for all WPMU DEV plugins. Maybe prefix all transient keys with the plugin slug for easy recognition. Ensure all transients have an expiration. See the Transients Manager plugin as a model for options. Maybe adopt that or incorporate it into Hummingbird with proper attributions. Note option to delete by name and delete expired transients. Maybe offer options to include or exclude persistent or unexpired records.

    – New options on the removal of old revisions and transients should be available for both immediate operations and scheduled.

    – Consider moving Hummingbird functional components from Notifications to Asset Optimization, and reserve Notifications area purely for reporting.

    – Consider processing autoloads. See WP-Sweep

    No confirmation requested until the dev team looks at this. Thanks!!