Skip navigation

Tag Archives: Agent

After reading a blog post from Tom LaRock (b|t|g+) from about this time last year, and then another post that Brad Schulz (b) expounded on referring to outliers, I found that both of these posts did very well to explain how to get information on long-running jobs that have completed.

However, I was tasked to alert on long-running jobs that are currently running. Because the sysjobhistory table will not get updated with data until a job step completes, the above approaches would not work in whole for my purpose. They are a great start, and a necessary insight to finding long-running jobs, but they only look at the past. I needed to look at the present.

There can be many reasons why a SQL Server Agent job hangs. Some of the issues resulting from a hung job are outlined here:

  • Resources can be consumed for the duration
  • Object blocking/locking due to open transactions
  • No notifications
  • The list goes on….

On to the solution –

To get the currently running jobs on a server, I executed the sp_help_jobs system stored procedure in the msdb database:

The @execution_status=0 parameter will bring back all jobs that are not idle or suspended. Refer to the link above to see the different status values.

Now that I know which jobs are executing (if any), how do I know what their typical execution time might be, or what the current elapsed execution time is? Enter the sysjobactivity and sysjobhistory msdb system tables. The sysjobactivity table is populated with all currently defined Agent jobs each time the Agent service is started. This table feeds the job activity monitor. The sysjobhistory table contains information on executed jobs. As stated previously, this table is only updated after a job step completes. Perfect for finding the average duration of successful execution of jobs. With this in mind, you can join these tables to get all the information you need to determine current and historical data.

In my case, the sysjobactivity table had two records for each Agent job on one of my servers. This can happen when the SQL Server Agent service terminates while jobs are running. To grab the correct rows, the sysjobactivity table has a session_id column that corresponds to the same column in the syssessions table, which is updated with a new session_id and agent_start_date each time the SQL Server Agent service is restarted.  I joined the sysjobactivity table and the syssessions table, using MAX(session_id) to ensure I got the correct activity record.

Next, I wanted to utilize a “central management” server to run this script. Sure, you could monitor each server separately, but that just adds more jobs for me to watch, and report on in case of failure. Using linked servers and OPENROWSET, I can cycle through each of my SQL Servers, joining a result set from sp_help_job, sysjobactivity, and sysjobhistory. The only question I had here was how to get the server names\instance names that I wanted to cycle through. I declared a table variable, @Servers, and used hard-coded values to insert the data into that table for each server that I wanted to spin through. That method, however, is not exactly scalable. To fix that issue, I pulled the server names\instance names from the sys.servers system view in the master database.

So, now to put it all together, the script looks something like this:

  • Declare all the variables you will need.

Notice the execution of sp_configure to turn on the advanced option “Ad Hoc Distributed Queries.” Security measurements in our shop dictates that this option should be turned off. This block of script will enable the option just for the duration of the job. It will be turned off again, at the conclusion.

  • Populate the @Servers table variable, and get the first Server_ID for iteration. Use a while loop to spin through all the servers in the @Servers table variable, setting the @ServerName variable for use within the server context.

  • With the server context set, create the connection string for the OPENROWSET call, and create the SQL script that will grab all the data you will need to make your comparisons.

The script uses the run_duration column of the sysjobhistory table to calculate an average execution time of successful job runs, and adds a standard deviation to the average. Other posts talked about adding more than one STDEV, but in the case of a backup of a VLDB, I have seen the job take 10 – 13 hrs. Adding two standard deviations would allow for too much variance.  The calculations have to be done remotely because 1) the historical time is an aggregate, and 2) the current time could be in a different time zone than my management server. Also, I am grabbing the job name from the sp_help_job system stored procedure, to limit the amount of joins in the query. In the WHERE clause, I am limiting the AVG to successful executions, and times for the entire job. Finally, insert the results into the @RunningJobTimes table variable to spin through the jobs/server to compare times, one at a time.

  • If there were rows inserted into the @RunningJobTimes table variable, this means that there are jobs running, and we want to compare the current execution time(s) vs. the historical time calculated in the script above.

Here,  we use the @@ROWCOUNT built-in system function to check if any rows were inserted into the @RunningJobTimes table variable. If yes – run the comparison. If not – skip it. The comparison is simple. If the current execution time is greater than the calculated historical time, build an email and send it, using the sp_send_dbmail stored procedure. In order to make the email easily readable on a mobile device, the body of the email is short and sweet. It includes the name of the server first in the subject line, so the DBA can begin triage without having to open the email. The body includes the job name, the time it has been executing, and the amount of overrun.

  • Clear out the @RunningJobTimes table variable, and move on to the next server. After the last server has been iterated, turn off the “Ad Hoc Distributed Queries” advanced option.

This script can easily be turned into a stored procedure, and an Agent job. All you have to do is remove the advanced options blocks, and put CREATE PROCEDURE usp_<procedure_name> AS BEGIN at the top. Be sure to remove the GO at the bottom of the script as well. If your shop is like mine, you can instead put the advanced options blocks into the Agent job as steps one and three, with the stored procedure execution being step two.

I love to learn new things! As Albert Einstein once said,

“Learn from yesterday, live for today, hope for tomorrow. The important thing is not to stop questioning.”

Not to stop questioning… Hmmm, I think I can live with that.

Speaking on questioning, my learning experience today was actually a question a colleague of mine asked, but I was not sure of the answer. The question was, “If you have an SSIS package with configurations enabled, and you are using an XML configuration file, do you have to specify that config file in the Agent job?” I don’t like saying, “I don’t know”, so my answer was, “Give me a few minutes.”

I created a test case, because I had several questions that I wanted to answer. First, I created a quick database with one table that had an integer identity column (force of habit to create a key!), a column that would accept a string of different values, depending on where that value came from, and a column for an import date, again, out of force of habit.

Next, I created a simple SSIS package. One object, an Execute SQL Task, with an insert routine (INSERT INTO Config_Test VALUES (?, GetDate()), using a package variable as the only parameter.

The next thing was to create an Agent job to execute the package. I have not defined any configurations yet, so that I can get results that I am familiar with and expect before I try to retrieve results that I am not as familiar with. As with the SSIS package, the Agent job is a simple, one-step job that will simply execute the SSIS package, nothing more.

When I execute the job, I get the results I expected. The Package Variable’s value is inserted into the test table, and all is well with the world.

So, now that I have the return I expected, let’s see what happens when I enable configurations on the package and not the Agent job…

I enabled configurations (In Control Flow context, SSIS -> Package Configurations…) and then I designated an XML file for my configuration. I only exported the Package Variable’s value to the config file. I saved the SSIS package, and now I want to edit that XML file so that the variable’s value will reflect where the value is coming from.

I do not have to make any changes to the Agent job. When I execute the job again, will the package use the config file, even though it is not designated in the job?

Yes! Even if I do not specify that my SSIS package uses a config file, it will use the configuration file that I enabled within the package. So what is the configuration tab in the Agent job for? I made another configuration file, based on the first, and updated the value accordingly.

This time, I updated the Agent job to use the new, updated XML configuration file, while leaving the configuration file enabled in the package.

In essence, I have two different configuration files designated with this package. The first in the package, and the second, in the Agent job. Now this is a scenario I am curious about. Which configuration will the SSIS package use?

That settles it! If configurations are enabled in an SSIS package, they are not required in the Agent job that runs the package. Furthermore, if a configuration file is specified in the Agent job, that configuration is used in preference to the configuration that is enabled in the SSIS package itself. This could be useful to test for certain different configurations in an Agent job without changing the current configuration designated in the package. It can even be a totally different file.

I love to learn new things!