PowerShell: Check all servers for failed jobs.

This is a two part process.

First we use our [master_admin].[dbo].[rp_JobFailures] stored procedure to report any failures on the server. Here’s the guts of our procedure.

DECLARE @nDaysAgo INT
SELECT  @nDaysAgo = 1
SELECT  CAST([j].[name] AS VARCHAR(128)) [FAILED_job_name],
        LEFT(DATENAME(dw, CAST([run_date] AS CHAR(8))) + SPACE(1)
             + CAST([master_admin].[dbo].[fn_INTtoDateTime]([h].[run_date], [h].[run_time]) AS VARCHAR), 30) [RunDate],
        CASE [run_status]
          WHEN 0 THEN 'Failed'
          WHEN 1 THEN 'Succeeded'
          WHEN 2 THEN 'Retry'
          WHEN 3 THEN 'Canceled'
          WHEN 4 THEN 'In progress'
        END [job_status],
        (ISNULL(MAX([s].[step_id]), '')) [StepID],
        (ISNULL(MAX([s].[step_name]), '')) [StepName],
        (ISNULL(MAX([command]), MAX([h].[message]))) [StepCommand],
        MAX([operator_id_emailed]) [NotifiedID]
FROM    [msdb]..[sysjobhistory] [h]
JOIN    [msdb]..[sysjobs] [j]
ON      [j].[job_id] = [h].[job_id]
JOIN    [msdb]..[syscategories] [c]
ON      [j].[category_id] = [c].[category_id]
LEFT OUTER JOIN (
                 -- All jobsteps that have a final history job outcome of
	SELECT  [s].[step_name],
            [s].[command],
            [s].[step_id],
            [s].[job_id],
            [last_run_outcome]
    FROM    [msdb]..[sysjobsteps] [s] WITH (NOLOCK)
    LEFT OUTER JOIN (
                     SELECT [job_id]
                     FROM   [msdb]..[sysjobhistory] [h]
                     WHERE  [run_status] = 0
                            AND [h].[run_date] >= REPLACE(CONVERT(VARCHAR(10), DATEADD(DAY, -@nDaysAgo, GETDATE()), 121),
                                                          '-', '')
                    ) [okay]
    ON      [okay].[job_id] = [s].[job_id]
    WHERE   [okay].[job_id] IS NOT NULL
            AND [last_run_date] >= REPLACE(CONVERT(VARCHAR(10), DATEADD(DAY, -@nDaysAgo, GETDATE()), 121), '-', '')
                ) [s]
ON      [h].[job_id] = [s].[job_id]
        AND [h].[step_id] = [s].[step_id]
WHERE   [h].[run_date] >= REPLACE(CONVERT(VARCHAR(10), DATEADD(DAY, -@nDaysAgo, GETDATE()), 121), '-', '')
        AND [run_status] NOT IN (1, 4)
GROUP BY [j].[name],
        [run_date],
        [run_time],
        [run_status]
ORDER BY [h].[run_date],
        [h].[run_time],
        [j].[name] 

Then we use PowerShell to run that procedure on each of our SQL Servers, stored in $AllServersList.

# ------------------------------------------------------------------------------------------------------------------------
# What jobs failed since yesterday?

$TSQL = 'EXEC [master_admin].[dbo].[rp_JobFailures] @nDaysAgo=1, @vcSkipJob=NULL'

$VerbosePreference="Continue"
$bDEBUG = $true
#	$bDEBUG = $false ; $VerbosePreference="SilentlyContinue" ; $dt=@()
$AllServersList | %{
	$CurrentServer = $_
	Write-Host -ForegroundColor Blue "$(Get-Date -Format g)	[$CurrentServer]"
	Write-Verbose "	Invoke-sqlcmd2 -ConnectionTimeout 10 -ServerInstance $_ -Query ""$TSQL"""; 

	if (!($ExclusionList -contains $CurrentServer)) {
		if ($bDEBUG -eq $false) {
			$dr = Invoke-sqlcmd2 -ConnectionTimeout 10 -ServerInstance $CurrentServer -Query "$TSQL" -As DataRow
			if ($dr -ne $null) {
				$dr | add-member -name ServerName -type noteproperty -value "$CurrentServer"
				$dt += $dr
			}
		}
	} else {
		Write-Host -ForegroundColor Red "Skipping $CurrentServer $_"
	}
}

$dt |Select-Object ServerName, RunDate, Failed_Job_Name, StepID, StepName, StepCommand |ft -AutoSize

You can see I like to run Foreach without doing anything initially to make sure I got it right. Then highlighting from the commented $bDEBUG line to the end gets us what we want. I also use an exclusion list which is simply an array of server names to bypass.

The SQL purists will suggest we just run it within SSMS. Just create server groups for all your servers and right-click and run the query.

Comments are closed.

Post Navigation