OMS Alerting Walkthrough

Written by Tao Yang

imageIntroduction

Earlier today, the OMS product team has announced the OMS Alerting feature has entered Public Preview. This is indeed an exciting news and it is another good example that Microsoft is working very hard to close the gaps between OMS and the existing On-Prem monitoring solution – System Center Operations Manager. Alex Frankel from the OMS product team has already given a brief introduction on this feature from the announcement blog post. In this post, I will demonstrate how I used this feature to alert and auto-remediate an issue detected in my lab environment.

Background

Few months ago, I have lost my lab OpsMgr management group completely due to hardware failures. After I replaced faulty hardware and built a brand new management group, I re-configured all the servers in my lab reported to the new management group. However, I then started getting many “Failed to enable Advisor Connector on the computer” alerts in my OpsMgr environment:

image

These alerts were raised because I did not unregister these agent computers from the previous management group (I couldn’t because it was dead), as explained in this forum thread. To fix this issue, on each OpsMgr agent, I must delete several registry keys, reset a regkey  values and then restart the health service.

Since My OpsMgr management group is connected to an OMS workspace, and I have enabled Alert Management solution (so all OpsMgr alerts are also uploaded into OMS), I have configured using the new OMS alerting feature to automatically remediate this error for me.

In order to configure the alerting and remediation for this OpsMgr alert, I need to following components:

  • OpsMgr management group connected to OMS
  • OMS Alert Management solution enabled
  • OMS Automation solution (Azure Automation) enabled
  • At least one Azure Automation Hybrid Worker is configured (because I need to target the remediation runbook to on-premises lab servers.
  • OMS Alerting and Alert remediation feature enabled

Creating Azure Automation Runbook

So first things first, I must create and publish the remediation runbook in the Azure Automation account before we can select it when we create the OMS alert. Although we cannot configure what parameters to pass into the runbook, the OMS alert passes the search result and some meta data into the runbook in JSON format (I will show it later). So based on my experience, in order to make the runbooks re-useable, we can some optional input parameters for the runbook, and inside the runbook, check if any of these optional parameters are null, then retrieve the value elsewhere (i.e. Azure Automation variable and credential assets).

In this case, I have created a PowerShell based runbook called Remove-SCAdvisorRegistration, the code is listed below:

Now, let’s fast forward a little bit and explain what does the input parameter from OMS alert look like. When we have configured Alert remediation during the OMS alert creation, a webhook for the runbook is automatically created. OMS uses this webhook to start the runbook. It passes a parameter called “WEBHOOKDATA”, which is in JSON format into the runook. You can see the actual input by clicking on the INPUT tile in the runbook job execution history:

image

If you copy and paste this input into a text editor such as Notepad++ and format it as a JSON document, it looks like this:

image

As you can see, the “SearchResults” contains 3 elements:

  • id
  • __metadata
  • Value

The Value property is where you can retrieve the search result, and it is defined as an array. When I was writing the remediation runbook, I was able to get the offending OpsMgr agent computer name from the “SourceDisplayName” field of each item in the “Value array”.

Now the runbook is created, make sure it is published before we heading back to the OMS portal start creating the alert. Please note that we will have to come back and revisit this runbook after the alert is created.

Creating OMS Alert

The search query that I’m using for this alert is:

Type=Alert AlertState=New AlertName=”Failed to enable Advisor Connector on the computer.”

image

I’m creating the alert with the following parameters:

  • Name: Alert – Failed to enable Advisor on computer
  • Schedule: every 15 minutes
  • Generate Alert when: Greater than 0
  • Over the time window: 15 minutes
  • Send Email Notification: Yes
  • Email Subject: Failed to enable Advisor on computer alert
  • Email Address: <Your email address>
  • Enable Remediation: Yes
  • Remediation Runbook: Remove-SCAdvisorRegistration

SNAGHTML1a29c296

After the alert is saved, you will be able to see it in the Settings/Alerts page:

image

Reconfiguring Runbook Webhook

In this example, because the runbook must be executed against a Hybrid Worker group (as we are targeting computers in on-prem network), I must reconfigure the webhook (created by OMS alert) to target a Hybrid Worker group (instead of the default config of targeting Azure workers). You can do so by going to the webhook parameters section, and choose Hybrid Worker group from the drop down list:

image

Note:

Please do not modify any other input parameters for the webhooks created by OMS alerts. If you do, the changes you’ve made won’t be saved in Azure Automation. Based on my experience, the only change you can modify for the webhook is the “Run on” parameter (Azure VS. Hybrid Worker).

From now on, this alert will be executed every 15 minutes, and search for the result (based on the search query) created within the last 15 minutes. If the number of records returned from the search is greater than 0 (as we configured), you will get an email similar to this one:

image

The OMS alert will also kick off the remediation runbook via the webhook. Because I have enabled verbose logging for this runbook, I was able to see some additional verbose messages:

image

Additional Resources

Test-OMSAlertRemediation Runbook

I have also written a test runbook called Test-OMSAlertRemediation that you can use for any OMS alerts. This extracts information from the JSON input and send to you via email. It should be very helpful for you when you are authoring real remediation runbooks (so you know what kind of input data you can play with). I will publish it in the next blog post as it’s getting closer to mid night now.

New OMS Ebook – Inside the MS Operations Management Suite

Over the last few months, I have been working with Pete Zerger, Stanislav Zhelyazkov and Anders Bengtsson on a free ebook for OMS. OMS Alerting is also explained in more details in this book. It will be released very soon, so stay tuned!

OMS_Book_Anncmt

2 comments on “OMS Alerting Walkthrough

  1. This is nice, but let’s be honest, OMS is very far from closing the gap with scom. I’ve messed around with oms and there is zero third party support and monitoring is incredibly basic. It’s not worth an investment now. Maybe in two or three years. Even then, I am skeptical. What it does excel at around the nonmonitoring value add features, like the dashboards. SquaredUp has superior dashboards, but they are lightyears ahead of scom web console dashboards.

  2. Pingback: Azure Automation Runbook: Test-OMSAlertRemediation | Tao Yang's System Center Blog

Leave a Reply