Skip navigation

Tag Archives: SQL Server

Oh what a difference a year makes! Well, to be more precise, 14 months and change… But I digress. This past year has been an exhilarating one for me. I have been very busy on just about every front I could be.

  • My son has moved on from Cub Scouting to Boy Scouting, and I am in the process of getting leadership training so that I can help to facilitate his transition into the awkward teen years.
  • I got engaged over the Christmas holiday, to my best friend, the woman whom I have dated for the past two years, who is the absolute love of my life.
  • I have gone from helping support the OPASS Orlando chapter of PASS to working with the SCSUG here in Brevard County, FL.
  • Last but not least, I have gone from working with one of the biggest healthcare providers in Central Florida to a Federal Government contractor providing health care to our country’s finest in Iraq, Afganistan, Africa, and here at home.

Part of my professional transition that I am feeling most excited about is going from working with some of the most out-spoken and prolific SQL Server professionals back in ’07, to working on a very qualified (some 20+ certifications among the team) team spanning the realms of data, storage, and system administration, to flying totally solo.

I have to admit, it is a little humbling when you walk into a shop, coming from years of working on teams with several other folks -some of the best in the business – in the same arena as you, all by your lonesome and sit down to get to work. The place I am at now has had very bad luck with DBAs. So much so, I found myself apologizing to the hiring manager during the interview because DBAs had given them such a horrible experience. I vowed right then and there to turn that impression around. To make their impression of what we do one of adulation, and appreciation. It hasn’t been easy. To say that they have had bad luck is really being somewhat modest. When I sat down to tackle this environment and reign in the rampant security violations and the configuration horrors, I realized a large part of the reason that they had such a hard time. It looked to me like nothing had ever been done to tame the data beast! Nothing!

Enter everything I have learned over the last six to seven years of being a DBA. Everything.

I cannot tell you how much I appreciate the skills I have been taught by my mentors. I have had to rely on every single ounce of knowledge I have gained, not only from the people I have worked with, but also the people who write amazing blogs, like Jonathan Kehayias, Grant Fritchey, Thomas LaRock, or Ola Hallengren. The ones who pen articles for SQL Server Pro, for Simple-Talk, or for SQL Server Central. The ones who write SQL Server books like “SQL Server 2008 Internals”, “SQL Server MVP Deep Dives” (Vols. 1 and 2), or “Pro SQL Server 2012 Reporting Services” (by some very dear friends of mine). These folks have all had a serious impact in my career, and for that I am eternally grateful.

So, to get back to the title, “I’ve Missed You…” – I have. And I will be looking forward to blogging my experiences with SQL and the characters that I come across in my adventures.

After reading a blog post from Tom LaRock (b|t|g+) from about this time last year, and then another post that Brad Schulz (b) expounded on referring to outliers, I found that both of these posts did very well to explain how to get information on long-running jobs that have completed.

However, I was tasked to alert on long-running jobs that are currently running. Because the sysjobhistory table will not get updated with data until a job step completes, the above approaches would not work in whole for my purpose. They are a great start, and a necessary insight to finding long-running jobs, but they only look at the past. I needed to look at the present.

There can be many reasons why a SQL Server Agent job hangs. Some of the issues resulting from a hung job are outlined here:

  • Resources can be consumed for the duration
  • Object blocking/locking due to open transactions
  • No notifications
  • The list goes on….

On to the solution –

To get the currently running jobs on a server, I executed the sp_help_jobs system stored procedure in the msdb database:

The @execution_status=0 parameter will bring back all jobs that are not idle or suspended. Refer to the link above to see the different status values.

Now that I know which jobs are executing (if any), how do I know what their typical execution time might be, or what the current elapsed execution time is? Enter the sysjobactivity and sysjobhistory msdb system tables. The sysjobactivity table is populated with all currently defined Agent jobs each time the Agent service is started. This table feeds the job activity monitor. The sysjobhistory table contains information on executed jobs. As stated previously, this table is only updated after a job step completes. Perfect for finding the average duration of successful execution of jobs. With this in mind, you can join these tables to get all the information you need to determine current and historical data.

In my case, the sysjobactivity table had two records for each Agent job on one of my servers. This can happen when the SQL Server Agent service terminates while jobs are running. To grab the correct rows, the sysjobactivity table has a session_id column that corresponds to the same column in the syssessions table, which is updated with a new session_id and agent_start_date each time the SQL Server Agent service is restarted.  I joined the sysjobactivity table and the syssessions table, using MAX(session_id) to ensure I got the correct activity record.

Next, I wanted to utilize a “central management” server to run this script. Sure, you could monitor each server separately, but that just adds more jobs for me to watch, and report on in case of failure. Using linked servers and OPENROWSET, I can cycle through each of my SQL Servers, joining a result set from sp_help_job, sysjobactivity, and sysjobhistory. The only question I had here was how to get the server names\instance names that I wanted to cycle through. I declared a table variable, @Servers, and used hard-coded values to insert the data into that table for each server that I wanted to spin through. That method, however, is not exactly scalable. To fix that issue, I pulled the server names\instance names from the sys.servers system view in the master database.

So, now to put it all together, the script looks something like this:

  • Declare all the variables you will need.

Notice the execution of sp_configure to turn on the advanced option “Ad Hoc Distributed Queries.” Security measurements in our shop dictates that this option should be turned off. This block of script will enable the option just for the duration of the job. It will be turned off again, at the conclusion.

  • Populate the @Servers table variable, and get the first Server_ID for iteration. Use a while loop to spin through all the servers in the @Servers table variable, setting the @ServerName variable for use within the server context.

  • With the server context set, create the connection string for the OPENROWSET call, and create the SQL script that will grab all the data you will need to make your comparisons.

The script uses the run_duration column of the sysjobhistory table to calculate an average execution time of successful job runs, and adds a standard deviation to the average. Other posts talked about adding more than one STDEV, but in the case of a backup of a VLDB, I have seen the job take 10 – 13 hrs. Adding two standard deviations would allow for too much variance.  The calculations have to be done remotely because 1) the historical time is an aggregate, and 2) the current time could be in a different time zone than my management server. Also, I am grabbing the job name from the sp_help_job system stored procedure, to limit the amount of joins in the query. In the WHERE clause, I am limiting the AVG to successful executions, and times for the entire job. Finally, insert the results into the @RunningJobTimes table variable to spin through the jobs/server to compare times, one at a time.

  • If there were rows inserted into the @RunningJobTimes table variable, this means that there are jobs running, and we want to compare the current execution time(s) vs. the historical time calculated in the script above.

Here,  we use the @@ROWCOUNT built-in system function to check if any rows were inserted into the @RunningJobTimes table variable. If yes – run the comparison. If not – skip it. The comparison is simple. If the current execution time is greater than the calculated historical time, build an email and send it, using the sp_send_dbmail stored procedure. In order to make the email easily readable on a mobile device, the body of the email is short and sweet. It includes the name of the server first in the subject line, so the DBA can begin triage without having to open the email. The body includes the job name, the time it has been executing, and the amount of overrun.

  • Clear out the @RunningJobTimes table variable, and move on to the next server. After the last server has been iterated, turn off the “Ad Hoc Distributed Queries” advanced option.

This script can easily be turned into a stored procedure, and an Agent job. All you have to do is remove the advanced options blocks, and put CREATE PROCEDURE usp_<procedure_name> AS BEGIN at the top. Be sure to remove the GO at the bottom of the script as well. If your shop is like mine, you can instead put the advanced options blocks into the Agent job as steps one and three, with the stored procedure execution being step two.