Introduction
All monitoring tools have some overhead associated with them but generally, the overhead is low compared to the value they deliver. How much load does DBA Dash add to a server?
To find out I setup an extended event to capture the queries ran by the DBA Dash agent. I left the trace running for over 8hrs (8:23:33, ~504mins). This was done on a real production server – one of our busiest servers that supports a SaaS application. This was done during peak load. The results are valid for this server running version 2.13.1 of DBA Dash.
TLDR: The overhead of running DBA Dash is very small and it delivers great value!
Results
Query | Samples | Avg Duration (ms) | Avg CPU (ms) | Avg Reads | Max Duration (ms) | Max CPU (ms) |
Instance | 1520 | 0 | 0 | 12 | 2 | 16 |
RunningQueries Changed to capture every 30sec instead of the default 1min on this server. Session waits are collected. | 1008 | 92 | 69 | 50 | 500 | 157 |
Plan Collection Plan collection is off by default but is enabled on this server. It only runs if there are plans to collect. | 942 | 30 | 24 | 2036 | 236 | 141 |
Text Collection Collects text for running queries and only runs if there is text to be collected. | 818 | 12 | 9 | 232 | 38 | 32 |
SlowQueries This is disabled by default. Query has a 1 second waitfor so look at CPU time | 504 | 1467 | 295 | 3521 | 1793 | 500 |
PerformanceCounters | 504 | 41 | 28 | 551 | 140 | 94 |
CPU | 504 | 29 | 25 | 32 | 94 | 63 |
ObjectExecutionStats | 504 | 17 | 14 | 1006 | 191 | 47 |
Waits | 504 | 3 | 3 | 0 | 28 | 16 |
AvalabilityReplicas | 504 | 11 | 2 | 39 | 52 | 16 |
DatabasesHADR | 504 | 2 | 2 | 362 | 4 | 16 |
AvailabilityGroups | 504 | 21 | 2 | 34 | 121 | 16 |
MemoryUsage | 504 | 1 | 1 | 31 | 3 | 16 |
IOStats | 504 | 0 | 0 | 0 | 2 | 16 |
JobHistory | 504 | 0 | 0 | 12 | 60 | 31 |
DBFiles | 8 | 6115 | 523 | 12199 | 16585 | 797 |
Backups | 8 | 279 | 215 | 12625 | 510 | 281 |
Custom Check Note: This runs custom check that is NOT part of DBA Dash and is unique to this server. This is YOUR code (Or mine in this case) | 8 | 1053 | 188 | 299241 | 1312 | 203 |
Databases | 8 | 139 | 107 | 21564 | 275 | 141 |
ServerExtraProperties | 8 | 598 | 41 | 1061 | 1088 | 62 |
DBConfig | 8 | 83 | 37 | 1843 | 183 | 78 |
LogRestores | 8 | 22 | 16 | 247 | 48 | 47 |
LastGoodCheckDB | 8 | 30 | 16 | 590 | 76 | 47 |
DBTuningOptions | 8 | 18 | 14 | 300 | 31 | 31 |
Corruption | 8 | 23 | 10 | 706 | 47 | 32 |
SysConfig | 8 | 3 | 8 | 308 | 8 | 16 |
ServerProperties | 8 | 2 | 2 | 1 | 5 | 15 |
TraceFlags | 8 | 0 | 0 | 0 | 0 | 0 |
DatabaseMirroring | 8 | 1 | 0 | 43 | 2 | 0 |
OSInfo | 8 | 3 | 0 | 146 | 6 | 0 |
Alerts | 8 | 8 | 0 | 206 | 13 | 0 |
Max date modified from sysjobs | 8 | 0 | 0 | 30 | 1 | 0 |
The queries with 504 samples are been run every 1min with RunningQueries sampled 1008 times on this server as it was changed to collect every 30 seconds instead of every 1min.
Of these frequently ran query captures, ironically it’s the SlowQueries capture query that is the slowest. The query has a 1-second delay built in while it starts a new extended event session before flushing the ring_buffer of the old one. It’s better to look at the average CPU time of 295ms for this query. It’s still the heaviest of the queries that are collected every 1min.
This is quite a busy production server and SlowQueries captures all queries taking longer than 1 second to run. This collection isn’t enabled by default and you can configure the threshold for collection as required.
The next slowest of the frequently executed queries is the RunningQueries capture with a 92ms average execution time. I’ve previously compared this to sp_WhoIsActive. In a lab environment with 12 active queries the average duration was only 15ms. So results will vary depending on your workload. The Plan Collection and Text Collection also form part of the RunningQueries capture with plan collection been optional.
Of the hourly collections, it’s the DBFiles collection that is the slowest. The overhead of this query will likely be different on your server depending on the databases and files on your SQL instance.
Other Caveats
- The Slow Query capture doesn’t take into account the overhead of the associated extended event.
- The repository database is on a different server. If you host both on the same server you will have additional overhead.
- The DBA Dash agent is ran on a different server. If you run the agent on your monitored SQL instance this will also add additional overhead.
- The results above are valid for this server with version 2.13.1 of DBA Dash.
- See here for a list of what DBA Dash captures and when. The analysis above doesn’t include collections that are ran daily as they didn’t fall into the 8hr window of this test.
- You have full control over the collection schedule. Disable some collections or change the frequency as required!
How to test
I’ve posted the results above to give an indication of the overhead of running DBA Dash. If you want to monitor the activity on your own server, this extended events session will get you started:
1 2 3 4 5 6 7 8 9 10 |
CREATE EVENT SESSION [DBADashAgent] ON SERVER ADD EVENT sqlserver.rpc_completed( ACTION(sqlserver.client_app_name) WHERE ([sqlserver].[like_i_sql_unicode_string]([sqlserver].[client_app_name],N'DBADash%') AND [object_name]<>N'sp_reset_connection')), ADD EVENT sqlserver.sql_batch_completed( ACTION(sqlserver.client_app_name) WHERE ([sqlserver].[like_i_sql_unicode_string]([sqlserver].[client_app_name],N'DBADash%'))) ADD TARGET package0.event_file(SET filename=N'DBADashAgent') WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF) GO |
I saved the results to a table and ran a quick and dirty query to group the results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SELECT grp, COUNT(*) Samples, AVG(duration/1000) AvgDurationMs, AVG(cpu_time/1000) AvgCPU, AVG(logical_reads) AvgReads, MAX(duration/1000) MaxDurationMs, MAX(cpu_time/1000) MaxCPUMs FROM dbo._DBADashTrace OUTER APPLY(SELECT ISNULL(batch_text,statement) AS txt) calc1 OUTER APPLY(SELECT CASE WHEN txt LIKE '%@plan%' THEN '{Plan Collection}' WHEN txt LIKE '%@handles%' THEN '{Text Collection}' WHEN txt LIKE '%sysjobhistory%' THEN '{JobHistory}' ELSE txt END AS grp) calc2 GROUP BY grp |