Removing large backup history with sp_delete_backuphistory

Problem

In SQL Server, the sp_delete_backuphistory stored procedure is is the standard and supported method to purge old data from the backup and restore history tables in the msdb database:

  • backupfile
  • backupfilegroup
  • backupmediafamily
  • backupmediaset
  • backupset
  • restorefile
  • restorefilegroup
  • restorehistory

Recently, I encountered an issue running sp_delete_backuphistory on servers that hosted a large number of databases with frequent log backup & restore operations. The clean up task hadn’t been scheduled and the history tables had grown very large over several months. The msdb databases was also hosted on a volume with limited IOPs.

Attempting to run sp_delete_backuphistory under these conditions you will likely encounter these issues:

  • The operation will take a long time
  • Log backup and restore operations are blocked during the process
  • Your msdb log file will grow large if you try to remove too much data at once
  • If you kill off the process at any point, the whole thing will rollback.

The sp_delete_backuphistory stored procedure gets the job done in most cases but it’s not very well optimized if you have a large amount of history to clear.

Optimized Solution for Backup History Cleanup

To clear a large amount of history without resorting to schema modifications or unsupported methods we can take an incremental approach to removing data.

The script below will check how much retention we currently have. It will then call sp_delete_backuphistory in a loop, removing data 1hr (configurable) at a time until we hit the target retention. We can also specify a maximum runtime for the process.

This approach won’t necessarily speed things up but it provides the following benefits:

  • Data is cleared as a series of smaller transactions rather than a single large transaction
  • Smaller transactions allow for log truncation and prevent the log file from growing.
  • We can interrupt the process without having to rollback all the progress that has been made. (just the current batch)
  • Log backups/restores might still be blocked, but only for the duration it takes to complete a single loop – not for the entire time it takes to clear the history. This can still impact RPO/RTO though.
  • We can control how long the process runs for. We can keep it running outside business hours or allow for breaks to prevent log backup/restores from falling too far behind.
  • The script will print progress information.

Links:

These related links might be interesting:

https://www.brentozar.com/archive/2018/02/the-annals-of-hilariously-bad-code-part-1/
https://www.brentozar.com/archive/2018/02/annals-hilariously-bad-code-part-2/
https://dba.stackexchange.com/questions/233375/how-to-minimize-plan-ahead-the-effect-of-running-sp-delete-backuphistory-measur

T-SQL Tuesday #154 Invitation – SQL Server 2022

Introduction

This is my first time participating in T-SQL Tuesday. The Invitation from Glenn Berry asks to write about what you’ve been doing with SQL 2022.

Getting DBA Dash ready for SQL 2022

I was eager to test DBA Dash with SQL 2022 and started testing with the first public release (CTP 2.0). I added a SQL 2022 instance to my lab environment, created using AutomatedLab.

Unlike my other lab machines, I ran through the installer for 2022 manually to get up and running quickly and test the new installer.

DBA Dash only needed some minor modifications to work with SQL 2022. The first and most important step was to add version 16 to the hardcoded list of supported versions and also update some static data for SQL Server 2022. I hardcode the list of supported versions to give me the opportunity to test new versions first and fix any issues.

This initial support for SQL 2022 was added in May. I’ve been running DBA Dash in my lab environment since then and it works well.

DBA Dash – support for new features in SQL 2022

QAT_DEFLATE Backup Compression

Version 2.22.0 adds support for capturing the compression algorithm used on SQL 2022.

Glenn Berry has some good articles on this new backup compression algorithm here and here. I’m getting similar results to Glenn with my own testing using QAT_DEFLATE (software mode) – faster backups, higher compression and lower resource utilization.

sys.databases

SQL 2022 adds some new columns to sys.databases:

  • is_data_retention_on
  • is_ledger_on
  • is_change_feed_enabled

I’m planning to add these to the DBA Dash collection. Only is_ledger_on is currently documented.

Other

On the to-do list is to test DBA Dash with contained availability groups. DBA Dash has monitoring for agent jobs and availability groups so it’s likely some changes will be required in this area.

What I’m excited about in SQL 2022

The intelligent query processing features in SQL 2022 and tempdb concurrency enhancements are very interesting. There is also some really useful language enhancements like GENERATE_SERIES, DATE_BUCKET, GREATEST and LEAST that I can see myself using. Null handling also gets easier with IS [NOT] DISTINCT FROM. I can’t use any of this for DBA Dash as I want to keep the repository compatible with SQL 2016 for now. The language enhancements will be useful for some of the SaaS databases I manage though once we upgrade.

The ability to failover back and forth between Azure managed instance and SQL 2022 is also a game changer. It certainly takes some of the risk out of the process if you are considering using managed instances.

The feature I’m most excited about? Backup/Restore to S3. If you host SQL instances on AWS you have to figure out how to get your backups to S3. There isn’t a standard way to do this. Many people backup to an EBS volume first then push the files to S3 which isn’t ideal. EBS storage is expensive and the two step process to get the backups to S3 adds time and complexity to the process. The restore process is also slower and more complex as you need to pull the backups from S3 first before you can restore them.

The ability to backup directly to S3 is a huge win. It also has value outside of AWS as there are a number of other S3 compatible storage providers.

How to handle SQL Agent Jobs with Always On Availability Groups?

Introduction

When you configure your SQL servers for high availability you want the failover to be as seamless as possible. If you use availability groups, mirroring or log shipping as part of your high availability strategy you will need to take care of anything that lives outside the context of your databases. That might include logins, SQL Server agent jobs and ensuring consistent server level configuration across your availability nodes. This article will discuss how to handle SQL server agent jobs.

Not so seamless failover of agent jobs

If you have any SQL agent jobs that run in the context of your user databases these will need special handling to ensure a seamless failover. Ideally Microsoft would have added some improvements to the SQL agent to help with this. Maybe given us agent jobs that can live within the context of a user database instead of msdb?

As it stands you will need to keep your agent jobs in sync between the primary and secondaries and figure out a mechanism to activate the jobs on the new primary following failover and disable them on the old primary.

A Solution

There are a few ways to handle this, but my approach is to run all the jobs in the context of the master DB. I then check if the user DB is accessible and execute the command with a 3 part name. See the script below for an example. This will work for availability groups, mirroring and log shipping.

With this approach you can create your jobs and have them active on your primary and secondaries. You don’t need to worry about enabling/disabling jobs on failover – the jobs are already running but they only do any work if the database is primary. This makes the failover totally seamless. The downside to this approach is that the jobs will report that they ran successfully even if they didn’t do any work which could cause confusion.

The next problem is keeping the jobs in sync between your servers. You can script out the jobs using SSMS easy enough but this could get tedious if you have a larger number of jobs. I’ve created a powershell script below to help with this which you can use below. You can use this to report if your jobs are out of sync and also for performing a synchronization. See the comments at the top of the script for some usage examples.

T-SQL Proper Case Function

This function converts a string to proper case.

e.g.
Convert this: A LONG TIME AGO IN A GALAXY FAR, FAR AWAY
To this: A Long Time Ago In A Galaxy Far, Far Away

Note: It might be better to do this in the application rather than the database and it could be better written as a CLR function. This is useful if you need a T-SQL solution for a one off purpose.