Category Archives: SCOM
I have been refreshing my lab servers to Windows Server 2016. I’m using the Non GUI version (Server Core) wherever is possible.
When working on Server Core servers, I found it is troublesome that I can’t access the Microsoft Monitoring Agent applet in Control Panel:
Although I can use PowerShell and the MMA agent COM object AgentConfigManager.MgmtSvcCfg, Sometime it is easier to use the applet.
After some research, I found the applet can be launched using command line:
C:\Program Files\Microsoft Monitoring Agent\Agent\AgentControlPanel.exe
OpsLogix has recently released a new product to the market called “EZalert”. It learns the operator’s alert handling behaviour and then it is able to automatically update Alert resolution states based on its learning outcome. You can find more information about this product here: http://www.opslogix.com/ezalert/. I was given a trail license for evaluation and review. Today I installed it on a dedicated VM and connected it to my lab OpsMgr management group.
Once installed, I could see a new dashboard view added in the monitoring pane, and this is where we tune all the alerts:
From this view, I can see all the active alerts, and I can start tuning then either one at a time, or I can multiple select and set desired state in bulk. Once I have gone through all the alerts on the list, I can choose to save the configuration under the Settings tab:
Once this is done, any new alerts that have previously been trained will be updated automatically when it was generated. i.e. I have created a test alert and trained EZalert to set the resolution state to Closed, as you can see below, it was created at 9:44:57AM and modified by EZalert 2 seconds later:
Once the initial training process is completed and saved, the training tab will become empty. Any new alerts generated will show up in the training tab, and you can see if there’s a suggested state assigned, and you can also modify it by assigning another state:
And all previously trained alerts can be found in the history tab:
You can also create exclusions. if you want EZalert to skip certain alerts for certain monitoring object (i.e. Disk space alert generated on C:\ on Server A), you can do so by creating exclusions:
In my opinion, this is a very good practice when tuning alerts. when setting alert resolution states, you only need to do it once, and EZalert learns your behaviour and repeat your action for you in the future. It will be a huge time saver for all your OpsMgr operators over the time. It will also become very handy for alert tuning in the follow situations:
- When you have just deployed a new OpsMgr management group
- When you have introduced new management packs in your management group
- When you have updated existing management packs to the newer versions
EZalert vs Alert Update Connector
Before EZalert’s time, I have been using the OpsMgr Alert Update Connector (AUC) from Microsoft (https://blogs.technet.microsoft.com/kevinholman/2012/09/29/opsmgr-public-release-of-the-alert-update-connector/). I was really struggling when configuring AUC so I developed my own solution to configure AUC in an automated fashion (http://blog.tyang.org/2014/04/19/programmatically-generating-opsmgr-2012-alert-update-connector-configuration-xml/) and I have also developed a management pack to monitor it (http://blog.tyang.org/2014/05/31/updated-opsmgr-2012-alert-update-connector-management-pack/). In my opinion, AUC is a solid solution. It’s been around for many years and being used by many customers. But I do find it has some limitations:
- Configuration process is really hard
- Configuration is based on rules and monitors, not alerts. So it’s easy to incorrectly configure rules and monitors that don’t generate alerts (i.e. perf / event collection rules, aggregate / dependency monitors, etc).
- Modifying existing configuration causes service interrupt due to service restart
- When running in a distributed environment (on multiple management servers), you need to make sure configuration files are consistent across these servers and only one instance is running at any given time.
- No way to easily view the current configurations (without reading XML files)
I think EZalert has definitely addressed some of these shortcomings:
- Alert training process is performed on the OpsMgr console
- No need to restart services and reload configuration files after new alerts are added or when existing alerts are modified
- Configurations are saved in a SQL database, not text based files
- Current configuration are easily viewable within the SCOM console
However, AUC has the following advantages over EZalert:
- AUC supports assigning different values to different groups or individual objects. In EZalert, the exception can only be created for individual monitoring objects and it doesn’t seem like you can assign different value for this object, it’s simply on/off exception
- Other than Alert resolution state, AUC can also be used to update other alert properties (i.e. custom fields, Owner, ticket ID, etc.). EZalert doesn’t seem like it can update other alert fields.
Things to Consider
When using EZalert, in my opinion, there are few things you need to consider:
1. It does not replace requirements for overrides
If you are training EZalert to automatically close an alert when it’s generated, then you should ask yourself – do you really need this alert to be generated in the first place? Unless you want to see these alerts in the alert statistics report, you should probably disable this alert via overrides. EZalert should not be used to replace overrides. if you don’t need this alert, disable it! it saves resources on both SCOM server and agent to process alert, and database space to store the alert.
2. Training Monitor generated alerts
As we all know, we shouldn’t manually close monitor generated alerts. So when you are training monitor alerts, make sure you don’t train EZalert to update the resolution state to “Closed”. consider using other states such as “Resolved”.
3. Create Scoped roles for normal operators in order to hide the EZalert dashboard view
You may not want normal operators to train alerts, so instead of using the built-in operators role, you’d better create your own scoped role and hide the EZalert dashboard view from normal operators
I believe EZalert has some strong use cases. Unless you have a very complicated alert flow automation process that leverages other alert fields such as custom fields, owner, etc. (i.e. for generating tickets, etc) and you are currently using AUC for this particular reason, I think EZalert gives you a much more user friendly experience for ongoing alert tuning.
I have personally implemented AUC in few places, and I still get calls every now and then from those places asking help with AUC configuration and it’s been few years since it was implemented. Also I’m not exactly sure if AUC is officially supported by Microsoft because it was originally developed by an OpsMgr PFE at this spare time (I’m not entirely sure about the supportability of AUC, maybe someone from MSFT can confirm). Whereas EZalert is a commercial product, the vendor OpsLogix provide full support of it.
lastly, if you have any questions about EZalert, please feel free to contact OpsLogix directly.
Squared Up is set to release the version 3 of their dashboard next week at Ignite North America. One of the key features in the v3 release is called the “Visual Application Discovery & Analysis” (aka VADA).
VADA utilise OpsMgr agent tasks and netstat.exe command to discover the other TCP/IP endpoints the agents are communicating to. You can learn more about this feature from a short YouTube video Squared Up has published recently: https://www.youtube.com/watch?v=DJK_3SritwY
I was given a trail copy of v3 for my lab. After I’ve installed it and imported the required management pack, I was able to start discovering the endpoints that are communicating to my OpsMgr agents in the matter of few clicks:
As we all know, natively, OpsMgr is lacking the capability of automatically Distributed Application discovery, customers used to integrate 3rd party applications such as BlueStripe FactFinder with OpsMgr for this capability. However, now that BlueStripe has been acquired by Microsoft and it’s being fitted under the OMS banner as the Application Dependency Monitor solution (ADM), customers can no longer purchase it for OpsMgr. It is good to see that Squared Up has released something with similar capabilities because at this very moment, it seems to be a gap in the OpsMgr space.
Having said that, I don’t think the OMS ADM solution is too far away from the public preview release.
One of the biggest differences I can see (after spending couple of hours on Squared Up V3), is that Squared Up VADA collects ad-hoc data at the time VADA is launched (which triggers the agent ask), whereas OMS ADM has it’s own agents and it is collecting data continuously.
Additionally, looks like Squared Up VADA only supports Windows agents at this stage and OMS ADM will also support Linux agents.
At this stage, since we don’t know if BlueStripe will be made available to OpsMgr in the future, and Squared Up is releasing this awesome addition to their already-popular OpsMgr web console / dashboard product, why not give it a try and see what you can produce? I guess since the data collection is ad-hoc, it will make more sense to start the discovery in VADA during peak hours when the system is fully loaded and each components are actively communicating to each other, so you don’t miss any components.
Lastly, if you are going to attend Ignite NA next week and want to learn more about this new feature in Squared Up V3, please make sure you go find them at their booth.
OMS Network Performance Monitor (NPM) has made to public preview few weeks ago. Unlike other OMS solutions, for NPM, additional configuration is required on each agent that you wish to enrol to this solution. The detailed steps are documented in the solution documentation.
The product team has provided a PowerShell script to configure the MMA agents locally (link included in the documentation). In order to make the configuration process easier for the OpsMgr users, I have created a management pack that contains several agent tasks:
- Enable OMS Network Performance Monitor
- Disable OMS Network Performance Monitor
- Get OMS Network Performance Monitor Agent Configuration
Note: Since this is an OpsMgr management pack, you can only use these tasks against agents that are enrolled to OMS via OpsMgr, or direct OMS agents that are also reporting to your OpsMgr management group.
These tasks are targeting the Health Service class, if you are also using my OpsMgr 2012 Self Maintenance MP, you will have a “Health Service” state view, and you will be able to access these tasks from the task pane of this view:
I can use the “Get OMS Network Performance Monitor Agent Configuration” task to check if an agent has been configured for NPM.
i.e. Before an agent is configured, the task output shows it is not configured:
Then I can use the “Enable OMS Network Performance Monitor” task to enable NPM on this agent:
Once enabled, if I run the “Get OMS Network Performance Monitor Agent Configuration” task again, the task output will show it’s enabled and also display the configured port number:
and shortly after, you will be able to see the newly configured node in OMS NPM solution:
If you want to remove the configuration, just simply run the “Disable OMS Network Performance Monitor” task:
Few weeks ago, the OMS product team has made a very nice change for the Near Real Time (NRT) Performance data – the data aggregation has been removed! I’ve been waiting for the official announcement before posting this on my blog. Now Leyla from the OMS team has finally broke the silence and made this public: Raw searchable performance metrics in OMS.
I’m really excited about this update. Before this change, we were only able to search 30-minute aggregated data via Log Search. this behaviour brings some limitations to us:
- It’s difficult to calculate average values based on other intervals (i.e. 5-minute or 10-minute)
- Performance based Alert rules can be really outdated – this is because the search result is based on the aggregated value over the last 30 minutes. In critical environment, this can be a bit too late!
By removing the data aggregation and making the raw data searchable (and living a longer life), the limitations listed above are resolved.
Another advantage this update brings is, it greatly simplified the process of authoring your own OpsMgr performance collection rules for OMS NRT Perf data. Before this change, the NRT perf rules come in pairs – each perf counter you want to collect must have 2 rules (with the identical data source module configurations). One rule is for collecting raw data and another is to collect the 30-minute aggregated data. This has been discussed in great details in Chapter 11 of our Inside Microsoft Operations Management Suite book (TechNet, Amazon). Now, we no longer need to write 2 rules for each perf counter. We only need to write one rule – for the raw perf data.
The sample OpsMgr management pack below collects the “Log Cache Hit Ratio” counter for SQL Databases. It is targeting the Microsoft.SQLServer.Database class, which is the seedclass for pre-SQL 2014 databases (2005, 2008 and 2012):
<?xml version="1.0" encoding="utf-8"?>
<ManagementPack SchemaVersion="2.0" ContentReadable="true" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Rule ID="OMS.NRT.Perf.Collection.Demo.SQL.Log.Cache.Hit.Ratio.Perf.Rule" Target="SQL!Microsoft.SQLServer.Database" Enabled="true" Remotable="false" ConfirmDelivery="false" Priority="Normal" DiscardLevel="100">
<DataSource ID="DS" TypeID="IPTypes!Microsoft.IntelligencePacks.Performance.DataProvider">
<CounterName>Log Cache Hit Ratio</CounterName>
<WriteAction ID="WA" TypeID="IPTypes!Microsoft.SystemCenter.CollectCloudPerformanceData_PerfIP" />
<LanguagePack ID="ENU" IsDefault="true">
<Name>OMS NRT Perf Collection Demo</Name>
<Name>OMS NRT Performance Collection Demo SQL Log Cache Hit Ratio Perf Rule</Name>
As you can see from the above sample MP, the rule that collects aggregated data is no longer required.
So if you have written some rules collecting NRT perf data for OMS in the past, you may want to revisit what you’ve done in the past and remove the aggreated data collection rules.
I’m teaming up with Infront Consulting, Australia and will deliver a 4-day in-person instructor-led SCOM 2012 bootcamp at Melbourne, Australia. The content of this bootcamp was developed by Infront Consulting group and it has been very popular internationally.
This bootcamp is designed for SCOM administrators and operators. If you are running SCOM (or planning to implement SCOM) in your environment, I strongly recommend you enrol to this bootcamp and spend 4 days with myself and other folks attending the bootcamp.
Here’s the detail of this training event:
SCOM 2012 Bootcamp – Australia
Date: 20 – 23 June 2016
Saxons Training Facilities Melbourne
500 Collins Street
Melbourne VIC 3000
Please join us for the first Infront Consulting SCOM 2012 Bootcamp in Australia! Tao Yang is a well-known author, speaker, blogger and SCOM expert who will be guiding you in person in the SCOM 2012 R2 Bootcamp.
This four-day Bootcamp is a mix of in-depth instructor led training and hands-on labs where you will learn how to administer System Center Operations Manager 2012. This course will provide students with an understanding of the Operations Manager 2012 Architecture, features and how to administer and maintain Operations Manager 2012.
Cost: $3,600 AUD + GST per student, includes course materials and access to Hands on Labs.
Session 1: Overview of System Center Operations Manager 2012
Session 2: Operations Manager 2012 Architecture
Session 3: Installing Operations Manager 2012
Session 4: Installing the Gateway Server Role
Session 5: Configuring Operations Manager Security
Session 6: Agent Deployment and Configuration
Session 7: Alert Notification and Incident Remediation
Session 8: Management Pack Tuning and Targeting Best Practices
Session 9: Tuning of the Core Microsoft MPs
Session 10: Application Performance Monitoring
Session 11: Network Monitoring in Operations Manager 2012
Session 12: Working in the Operations Manager Shell
Session 13: Building Custom Monitoring Solutions & Distributed Applications
Session 14: Reporting & Dashboards
Session 15: Third Party Extensions
Hope to see you there!
My next webinar with OpsLogix will take place on Wednesday 6th April 2016. In this webinar, I will demonstrate how to configure the OpsLogix VMware management pack, and provide an overview of this MP.
If you are interested in this MP, or looking for a solution for monitoring your VMware infrastructure, please make sure you attend this webinar because there are only limited places available.
You can find more details about this webinar from OpsLogix’s blog: http://www.opslogix.com/opslogix-vmware-mp-overview-with-tao-yang/
The registration is via Eventbrite:
I’m looking forward to seeing you then!
This is the 20th installment of the Automating OpsMgr series. Previously on this series:
- Automating OpsMgr Part 1: Introducing OpsMgrExtended PowerShell / SMA Module
- Automating OpsMgr Part 2: SMA Runbook for Creating ConfigMgr Log Collection Rules
- Automating OpsMgr Part 3: New Management Pack Runbook via SMA and Azure Automation
- Automating OpsMgr Part 4:Creating New Empty Groups
- Automating OpsMgr Part 5: Adding Computers to Computer Groups
- Automating OpsMgr Part 6: Adding Monitoring Objects to Instance Groups
- Automating OpsMgr Part 7: Updated OpsMgrExtended Module
- Automating OpsMgr Part 8: Adding Management Pack References
- Automating OpsMgr Part 9: Updating Group Discoveries
- Automating OpsMgr Part 10: Deleting Groups
- Automating OpsMgr Part 11: Configuring Group Health Rollup
- Automating OpsMgr Part 12: Creating Performance Collection Rules
- Automating OpsMgr Part 13: Creating 2-State Performance Monitors
- Automating OpsMgr Part 14: Creating Event Collection Rules
- Automating OpsMgr Part 15: Creating 2-State Event Monitors
- Automating OpsMgr Part 16: Creating Windows Service Monitors
- Automating OpsMgr Part 17: Creating Windows Service Management Pack Template Instance
- Automating OpsMgr Part 18: Second Update to the OpsMgrExtended Module (v1.2)
- Automating OpsMgr Part 19: Creating Any Types of Generic Rules
OK, it has been 6 months since my last post on this blog series. I simply didn’t have time to continue on, but I know this is far from over. I am spending A LOT of time on OMS these days, some of you guys may have heard (or have already read) our newly published book Inside Microsoft Operations Management Suite (TechNet, Amazon). I’m hoping you guys all have played with OMS and maybe even have started thinking what workloads can you move to OMS.
As we all know, we can pretty much categorise SCOM data into the following 4 categories:
- Performance Data
- Event Data
- Alert Data
- State Data
Unlike SCOM, since OMS does not use classes, there are no classes, relationships and state data in OMS, but for the other 3 types, we can easily get them over to OMS. With the SCOM alert data, you can simply enable the Alert solution after you have connected your SCOM management group to your OMS workspace. OMS also has its own alerting and remediation capability. For all existing performance collection and event collection rules, we can easily recreate them using a different Write Action module to store these data into OMS. In this post, I will show you how we can gather all performance collection rules from an existing OpsMgr management pack, and re-create these them for OMS (stored as PerfHourly data in OMS). But before we diving into it, let’s quickly go through the performance data in OMS.
OMS Performance Data
There are 2 types of performance data in OMS. The PerfHourly data was introduced with the Capacity Planning solution. As the name suggests, PerfHourly data is the hourly aggregated performance data. It does not store any raw perf data in OMS.
Another type of performance data is called Near-Real Time (NRT) performance data. NRT perf data can be access using queries such as Type=Perf. Unlike the PerfHourly data, NRT perf data can collect perf data as frequent as every 10 seconds, and the aggregation interval is every half hour. Both raw and aggregated NRT perf data are stored in OMS, where raw data is stored for 14 days and the OMS search queries only return aggregated data.
From the management pack point of view, it is a lot more complicated writing perf collection rules for NRT perf data. With the NRT perf data, we must always author 2 rules for every counter that we are going to collect, one for the raw data and one for the aggregated data. Secondly, for NRT perf data, when mapping performance data, the object name must always follow the format “\\<Computer FQDN>\<Object Name>”. Lastly, the collection rule that collects the aggregated data must use a Condition Detection module called “Microsoft.IntelligencePacks.Performance.PerformanceAggregator”.
Since an OpsMgr rule can only have up to one (1) condition detection member module, converting existing OpsMgr perf collection rules that already have an existing condition detection member module to OMS NRT perf rule may not be that straight forward. In this case, we may need to create some additional module types and things can get very complicated. It is certainly not something that we can use a generic script to achieve.
Therefore in order to make the script work with any existing OpsMgr performance collection rules, I have chosen to store the perf data in OMS as PerfHourly data because it has far less “red tapes”. Having said that, please keep in mind it is still possible to re-create OpMgr perf collection rules as OMS NRT perf collection rules, but it’s just not something we can develop as a generic automated solution.
If you want to learn more about performance data in OMS, or how to author OMS based collection rules in SCOM using VSAE, please refer to Chapter 5: Working with Performance Data and Chapter 11: Custom Management Pack Authoring of the Inside OMS book I mentioned in the beginning of this post.
PowerShell Script: Copy-PerfRulesToOMS.ps1
In the previous posts of this blog series, I have simply placed the scripts / runbooks within the post it self. I have decided to use Github from now on. So the script Copy-PerfRulesToOMS.ps1 can be found in one of my public Github repositories: https://github.com/tyconsulting/OpsMgr-SDK-Scripts/blob/master/OMS%20Related%20Scripts/Copy-PerfRulesToOMS.ps1
This script reads configurations of all performance collection rules in a particular OpsMgr management pack, and then recreate these rules with same configuration but stores the performance data as PerfHourly data in your OMS workspace. The OMS perf collection rules created by this script will be stored in a brand new unsealed MP with the name ‘<Original MP name>.OMS.Perf.Collection’ and display name ‘<Original MP display name> OMS PerfHourly Addon””’.
This script has the following pre-requisites:
- OpsMgrExtended PS module loaded on the machine where you are executing the script.
- An account with OpsMgr administrative rights
- OpsMgr management group must be connected to OMS
The script takes the following input parameters:
- ManagementServer – Specify the name of an OpsMgr management server that you wish to connect to. This is a mandatory parameter.
- Credential – Specify an alternative credential that has admin rights to the OpsMgr management group. This is an optional parameter.
- ManagementPackName – Specify the source MP where you want to copy to Perf collection rule to OMS. This is not the display name but the actual MP name. In the OpsMgr console, when you open the management pack property, it is the ‘ID’ field. i.e. since I’m going to use the OpsLogix VMware management pack as an example in this post, the name for this MP is “OpsLogix.IMP.VMWare.Monitoring”:
Executing the script:
I have added many verbose messages in the script, so you can use the optional –verbose switch when executing the script.
$cred = Get-Credential
.\Copy-PerfRulesToOMS.ps1 -ManagementServer "ManagementServerName" -Credential $cred -ManagementPackName "OpsLogix.IMP.VMWare.Monitoring" –Verbose
This script firstly connect to the management group, read the source MP, then retrieves all performance collection rules from this MP. If the source MP contains any perf collection rules, it will create a new unsealed MP and start creating a co-responding OMS PerfHourly collection rule for each original OpsMgr perf collection rule. the OMS PerfHourly collection rules will have the same properties, input parameters as well as the same data source and condition detection member modules as the original OpsMgr Perf Collection rules. But they will be configured to use another Write Action member module to send the perf data to OMS.
- The script detects OpsMgr Perf collection rules from the source MP by examining the actual write action member modules. If any of the write action member modules are either ‘Microsoft.SystemCenter.CollectPerformanceData’ (used to write perf data to OpsMgr operational DB) or ‘Microsoft.SystemCenter.DataWarehouse.PublishPerformanceData’ (used to write perf data to OpsMgr DW DB), then the script will consider the rule as a perf collection rule.
- When the source MP is unsealed, the script will failed under the following circumstances:
- a perf collection rule in the source MP is targeting a class defined in the source MP
- a perf collection rule in the source MP uses any data source or condition detection module types that are defined in the source MP
- The script does not disable any existing perf collection rules from the source MP
- The script copies all attributes from the source perf collection rule to the new OMS PerfHourly rule, including the ‘Enabled’ property. So if the source perf collection rule is disabled by default, then the newly created OMS PerfHourly rule will also be disabled by default.
- Depending on the number of OpsMgr Perf Collection rules to be processed, this script can take some time to finish because it is writing new OMS PerfHourly rules to the destination MP one at a time. I purposed coded the script this way (rather than writing everything at once), is because by doing so, if a particular rule has failed MP verification, it would not impact the creation of other rules.
When the execution is completed, you will see a new unsealed MP created in your management group:
and if I export it to XML and open it in MPViewer, I can see all the newly created OMS PerfHourly collection rules:
At this stage, I don’t need to do anything else and all the performance data collected by the source MP (OpsLogix VMware MP in this example) will be stored not only in OpsMgr, but also in OMS.
Because the original OpsMgr perf collection rules and the co-responding OMS PerfHourly rules are sharing the exact same data source modules with same configuration, this would not add additional overhead to the OpsMgr agents due to the OpsMgr Cook Down feature. However, please keep in mind that from now on, if you need to apply overrides to the either rule, it’s best to apply the same override to both rules (so you don’t break Cook Down).
Although the PerfHourly data will not appear in your OMS workspace straightaway (due to the aggregation process), you should be able to see them within few hours:
As you can see in the above screenshot, I now have all the VMware related counters defined in the OpsLogix VMware MP in my OMS workspace. the RootObjectName ‘VCENTER01’ is the vCenter server in my lab, and the ObjectDisplayName ‘exs01.corp.tyang.org’ is the VMware ESX host in my lab.
In this post, I have shared a script and demonstrated how to use this script to migrate your existing OpsMgr performance collection rules to OMS. We can easily write a very similar script for migrating existing event collection rules (maybe a blog topic for another day). I have demonstrated how to use this script to collect VMware related counters originally defined in the OpsLogix VMware MP.
In the next post of this series, I will demonstrate how to use OpsMgrExtended module, SharePointSDK module, Azure Automation, Hybrid Workers and SharePoint Online to build a portal for scheduling OpsMgr maintenance mode – this is based on one of the demos in my Azure Automation session with Pete Zeger from SCU 2016 APAC & Australia.
Until next time, happy automating!