Tag Archives: MimboloveHealth Check
Since I published the OpsMgr 2012 Data Warehouse Health Check Script last week, the responses I have received from the community have been overwhelming!
As I mentioned the the post that there might be potential issues when executing the script for an environment where the Data Warehouse DB is hosted on a named SQL instance, man people have reached out to me and confirmed this is indeed the case.
Over the last few days, I have been busy updating this script to address all the issues identified by the community. The version 1.1 is now ready.
I have addressed the following issues in this release:
- Fixed the issues with named SQL instances and SQL instances using non-default ports.
- Fixed the issue where the script failed to get management group default settings when executed in PowerShell version 5 preview.
- Fixed the error where incorrect Buffer Cache Hit Ratio counter is presented on the report.
- Additional pre-requisite check for PowerShell version. This script requires minimum version 3.0
- Additional pre-requisite check to test WinRM and remote WMI connectivity to each management server
- Fixed minor typos in the reports
- Additional optional parameter “-OutputDir”. You can now specify the script to write reports to a folder of your choice. This folder must be previously created by you. If the specified folder is not valid or this parameter is not used, the script will write the report files to the script root folder.
I have updated the original post, the updated version of the script can now be downloaded from the original link.
I’d like to thank everyone who tested and provided valuable feedback to me. This project is truly a wonderful community effort!
Note (19/06/2015): This script has been updated to version 1.1. You can find the details of version 1.1 here: http://blog.tyang.org/2015/06/19/opsmgr-2012-data-warehouse-health-check-script-updated/. The download link at the end of this post has been updated too.
I’m sure you all would agree with me that the OpsMgr database performance is a very common issue in many OpsMgr deployments – when it has not been designed and configured properly. The folks at Squared Up certainly feels the pain – when the OpsMgr Data Warehouse database is not performing at the optimal level, it would certainly impact the performance of Squared Up dashboard since Squared Up is heavily relied on the Data Warehouse database.
So Squared Up asked me to build a Health Check tool specific to OpsMgr data warehouse databases, in order to help customers identify and troubleshooting the performance related issues with the data warehouse DB. Over the last few weeks, I have been working on such a script, focusing on the data warehouse component, and produces a HTML report in the end.
We have decided to make this tool not only available to the Squared Up customers, but also to the broader community, free of charge. So on that, BIG Thank-You to Squared Up’s generosity.
Before I dive into the details, I’d like to show you what the report looks like. You can access the sample report generated against my lab MG here:
As shown in this sample, the report consists of the following sections:
Management Group Information
- Management group name and version
- server names for RMS Emulator, Operational DB SQL Server, Data Warehouse SQL server
- Operational DB name, Data Warehouse DB name
- Number of management servers, Windows agents, Unix agents, managed network devices and agentless managed computers
- Current SDK connection count (total among all management servers)
Data Warehouse SQL Server information
- Server hardware spec and OS version
- SQL server version and collation
- Minimum and Maximum assigned memory to the SQL server
Data Warehouse SQL DB information
- DB Name, creation date, collation, recovery mode
- Current state, is broker enabled, is auto-shrink enabled
- Current DB size (both data and logs), free space %
- Growth settings, last backup date and backup size
Temp DB configuration
- File size, max size and growth settings for each file used by Temp DB
SQL Service Account Configuration
- If the SQL Service account has “Perform volume maintenance tasks” and “Lock Pages in Memory” rights
Data Warehouse Dataset Configuration
- Dataset retention setting
- Retention setting for each dataset
- current row count, size and % of total size of each dataset
- Dataset aggregation backlog
- Staging Table Row Count for the following tables:
Key SQL and OS performance counters
- SQL performance counters
- SQLServer.Buffer.Manager\Buffer cache hit ratio
- Operating System performance counters
- Logical Disk(_total)\Avg. disk sec/Read
- Logical Disk(_total)\Avg. disk sec/Write
- Processor Information (_total)\% Processor Time
Collect Data Warehouse performance related events from each management server
- Event ID: 2115
- Event ID: 8000
- Event ID: 31550-21559
Since each environment is different, therefore I didn’t want to create a fix set of rules to flag any of above listed items good or bad. but instead, at the end of each section, I have included some articles that can help you to evaluate your environment and identify if there are any discrepancies.
This script has the following pre-requisites:
- The user account that is running the script (or the alternative credential passed into the script) must have the following privileges:
- local administrator rights on the Data Warehouse SQL server and all Management servers
- A member of the OpsMgr Administrator role
- SQL sysadmin rights on the Data Warehouse SQL server
- WinRM (PowerShell Remoting) must be enabled on the Data Warehouse SQL Server
- The OpsMgr SDK Assemblies must be available on the computer running the script:
- The script can be executed on a OpsMgr management server, web console server, or a computer that has OpsMgr operations console installed
- OR, manually copy the 3 DLLs from “<management server install dir>\SDK Binary” folder to the folder where the script is located.
Executing the script
The only required parameter is –SDK <OpsMgr Management Server name>, where you need to specify one of your management server (doesn’t matter which one). Additionally, if you use the –OpenReport switch, the HTML report will be opened in your default browser in the end. If you use -OutputDir to specify a directory, the reports will be saved to this directory instead of script root directory. If the directory you’ve specified is not valid, the script will save the reports to the script root directory instead (updated 19/06/2015). You can also use –verbose switch to see the verbose output:
.\SCOMDWHealthCheck.ps1 –SDK “OpsMgrMS01” -OutputDir C:\Temp\Reports\ –OpenReport –Verbose
Or if you need to specify alternative credential:
$password = ConvertTo-SecureString –String “password12345” –AsPlainText –Force
.\SCOMDWHealthCheck.ps1 –SDK “OpsMgrMS01” –Username “domain\SCOM.Admin” –Password $Password –OpenReport –Verbose
The report outputs the following files:
- Main HTML report
- Main Report in XML format
- Windows Event export from each management server in a separate HTML page
- Windows Event export from each management server in a separate CSV file
Note: The XML file is produced so if anyone wants to develop another set of tool to analyse the data for their own environment, it would be very easy to read the data from the XML file.
The script writes the list of the file it generated as output:
Possible Areas for Improvement Due to the limited environments that I have access to, I am unable to test this script in environments where Data Warehouse DB is installed on a named SQL instance or a SQL Always-On setup. So if your environment is setup this way, please contact me and let me know what’s working and what’s not. This issue is now fixed in version 1.1 (Updated 19.06/2015)
I couldn’t have done this by myself. I’d like to thank the following people (in random order) who helped me in testing and provided feedbacks:
Folks from Squared Up: Glen Keech, Richard Benwell
SCCDM MVPs: Marnix Wolf, David Allen, Daniele Grandini, Cameron Fuller, Simon Skinner, Scott Moss, Fleming Riis
And, the legendary Kevin Holman
I’d also like to thank for all the people who has indirectly contributed to this tool (where I included links to their awesome articles and publications in the report). Some of them are already listed above, but here are few more: Paul Keely (Author for the SQL Server Guide for System Center 2012 whitepaper), Michel Kamp, Bob Cornelissen, Stefan Stranger and Oleg Kapustin.
You can download the script from the link below. Please place the 2 files in the zip file in the same directory:
Lastly, as always, please feel free to contact me if you’d like to provide feedback.
I have just updated the SCCM Health Check Script to from version 3.3 to 3.5
Version 3.4 was finished a while back but I never got time to publish it in this blog. I only emailed 3.4 to few people who contacted me from my blog. Now that I’ve updated it again to 3.5, I thought I’ll just publish version 3.5.
What’s Changed Since 3.3?
- Added site system name under ‘site systems with issues’ section
- Detect site components that are missing heartbeats.
- Changed function Validate-DNSRecord to use Win32_ComputerSystem.caption rather than DNSHostname to retrieve computer name as DNSHostName is not available on computers before Windows 2008.
A new item has been added to the configuration XML (Health-Check.xml):
As the name suggest, the script raises any site systems as problematic if it has not sent heartbeat for over the X number of hours that you configured in XML (in my example, it’s 24 hours).
You may keep the old XML that you have already configured for your environment as long as you add the following lines in the Health-Check.XML:
You can download version 3.5 HERE.
I have updated the SCCM Health Check Script again. The latest version is now 3.3.
Below is what’s been updated since my last post for version 3.1:
- Fixed the bug where when using DOTNET sending emails to multiple recipients, it only sends to the first recipient from the list.
- It now zip the txt attachment to zip file before sending it. this is to improve the performance and avoid sending large attachments.
- Added functionality to check all current active package distribution
- Able to create exemptions for DNS suffix check. This can be configured in the XML. (this is required at work as there a HOST record is created for central site server in another forest because there’s no forwarders setup between 2 forests.)
- Improved DNS checks
- Fixed the bug when SQL DB is not running under default instance. The script now reads SQL DB location from primary site server’s registry.
The script package now contains an additional file ICSharpCode.SharpZipLib.dll This is an open source project from sharpdevelop.net. This file is used to zip txt attachment.
The script now contains the following files:
I’ve also been told the DNS check does not work well when SQL DB is on a cluster. I don’t have access to a SQL cluster where I can diagnose the problem. So please just be aware.
The script can be downloaded here. Please remember to customise the “Health-Check.XML” file before running it.
- 1. The script can now utilise Powershell Remoting to check inboxes sizes. It requires PS-Remoting to be enabled on all SCCM Site Servers. This dramatically reduced the execution time of the script in a multi-tier environment. In a production environment that I support, it reduced the execution time from 1.5 – 2 hours to around 35 minutes! You can configure which method to use via XML file. To enable, set <PSRemoting><Value> to Enabled. Or Disabled if you want to use the old Diruse.exe method.
30/01/2012: This script has been updated to version 3.5. Details can be found HERE. The download link on this article has also been updated to version 3.5.
26/05/2011: Version 3.3 has been posted here. The download link to the script on this post is also updated to the new version 3.3.
21/04/2011: Please be advised that I have posted a newer version of the script here. The existing script download link on this page has also been updated to point to the newer version. For the chanages in newer version, please refer to my updated post.
Over the last few months, I have been working on a PowerShell script to perform some health check activities for a customer’s entire SCCM environment. This is to provide a snapshot of health state of some elements of SCCM environment at a point of time since there is no SCOM in that environment to monitor SCCM at this stage.
The script checks the following:
- Ping check all servers in the SCCM infrastructure
- If first ping fails, wait for number of seconds (defined in XML file) then attempts to ping few more times (Number of retries defined in XML file).
- if returns any successful pings, ping test is classified as success.
- DNS name resolution check for all servers in SCCM infrastructure
- forward lookup check
- reverse lookup check
- compare DNS A record with the FQDN that’s set on the server
- All site systems in warning or critical state
- All site components in warning or critical state
- All package distribution with issues
- Checks all Non-PXE boot image packages in PXE DP share
- Checks any inboxes that contain number of files that’s over the threshold (threshold is set in the XML file)
- Checks availability of Inbox folders on all primary site servers
- Checks SCCM site backups on all primary sites within the “DaysToCheck” that’s set in XML file.
- Checks any errors in SQL server and SQL agent logs
- Checks Application logs on SQL servers for any SQL related errors.
What’s included in this script:
- SCCM-HealthCheck.ps1: the actual PowerShell script
- Health-Check.xml: contains all configuration settings for the script. this file needs to be modified to suit your environment before running the script.
- DIRUSE.exe: This is from Windows 2000 Resource Kit (http://support.microsoft.com/kb/927229). it is used to retrieve SCCM inboxes information. I have chosen to use this rather than the native PowerShell cmdlet Get-ChildItem because DIRUSE.EXE retrieves the information much faster against remote servers than Get-ChildItem.
Configuring the script:
The health check script reads all the settings from Health-Check.xml which is located in the same folder as the script.
Note: If you are having trouble reading the text on above image, this image can be download here
- The script has the option to email out the health check report (can be switched on and off in XML file)
- The email body is in HTML format that contains the overall status of each check.
- The detailed report is in TXT format and it is attached to the email. it is also located in the same folder as the script with the timestamp. if emailing is turned off, the detailed report can be located there.
Below is a sample HTML email body generated from my test environment:
- The PowerShell execution policy on the computer that’s running the script needs to be set to at RemoteSigned or Unrestricted.
- The account used to run this script needs to have:
- local admin rights on Central site server, Central site provider server (if not the site server itself)
- In the SQL servers, sysadmin rights or at least access to the master DB on all SQL servers to be able to read SQL server and agent logs.
- SMS admin access on all primary sites
- NTFS read permission to “inboxes” folders on all primary site servers.
- Scheduling the script in Windows Task Scheduler:
- “Allow log on as batch job” rights is required for the user account to run scheduled jobs.
- if scheduling in Windows 2008 or later, please make sure “Run with highest privileges” is ticked to bypass UAC (User Account Control)
- The operating system for SQL servers has to be Windows 2008 or later. This is because Get-WinEvent is used to read event log rather than using Get-EventLog because Get-EventLog does not support server side filtering. Therefore Get-WinEvent is used to improve performance when reading remote event logs. However, Get-WinEvent only works on Vista and later version of Windows.
- PowerShell Version 2 is required to run the script.
I’m planning to re-write some part of the script to give us an option to utilise PowerShell Remoting wherever is suitable. This will greatly improve the performance of the script (especially when gathering inboxes information across the WAN link). When this is done, Get-ChildItem can be used and executed locally on each site servers and eliminate the needs for DIRUSE.EXE.
I’ll get this done in the next few weeks and post it here once it’s done.