Removing large backup history with sp_delete_backuphistory

Problem

In SQL Server, the sp_delete_backuphistory stored procedure is is the standard and supported method to purge old data from the backup and restore history tables in the msdb database:

  • backupfile
  • backupfilegroup
  • backupmediafamily
  • backupmediaset
  • backupset
  • restorefile
  • restorefilegroup
  • restorehistory

Recently, I encountered an issue running sp_delete_backuphistory on servers that hosted a large number of databases with frequent log backup & restore operations. The clean up task hadn’t been scheduled and the history tables had grown very large over several months. The msdb databases was also hosted on a volume with limited IOPs.

Attempting to run sp_delete_backuphistory under these conditions you will likely encounter these issues:

  • The operation will take a long time
  • Log backup and restore operations are blocked during the process
  • Your msdb log file will grow large if you try to remove too much data at once
  • If you kill off the process at any point, the whole thing will rollback.

The sp_delete_backuphistory stored procedure gets the job done in most cases but it’s not very well optimized if you have a large amount of history to clear.

Optimized Solution for Backup History Cleanup

To clear a large amount of history without resorting to schema modifications or unsupported methods we can take an incremental approach to removing data.

The script below will check how much retention we currently have. It will then call sp_delete_backuphistory in a loop, removing data 1hr (configurable) at a time until we hit the target retention. We can also specify a maximum runtime for the process.

This approach won’t necessarily speed things up but it provides the following benefits:

  • Data is cleared as a series of smaller transactions rather than a single large transaction
  • Smaller transactions allow for log truncation and prevent the log file from growing.
  • We can interrupt the process without having to rollback all the progress that has been made. (just the current batch)
  • Log backups/restores might still be blocked, but only for the duration it takes to complete a single loop – not for the entire time it takes to clear the history. This can still impact RPO/RTO though.
  • We can control how long the process runs for. We can keep it running outside business hours or allow for breaks to prevent log backup/restores from falling too far behind.
  • The script will print progress information.

Links:

These related links might be interesting:

https://www.brentozar.com/archive/2018/02/the-annals-of-hilariously-bad-code-part-1/
https://www.brentozar.com/archive/2018/02/annals-hilariously-bad-code-part-2/
https://dba.stackexchange.com/questions/233375/how-to-minimize-plan-ahead-the-effect-of-running-sp-delete-backuphistory-measur