No matter how careful you are. No matter how many times you’ve checked your parameters and state conditions. No matter how many times you tested it out of production.
It happens.
Some misconfigured rule – or even an event that happens much differently in a production environment then in test – begins firing off alerts. Maybe you don’t notice it right away – perhaps you haven’t setup notifications for this particular rule.
But then, one day you start getting emails from your RMS with scary event IDs like 2115 (Data source not receiving a response), 25017 (Backlogged event processing) and 29202 (Inconsistent database state).
So you decide to investigate, and open up the Operations Console.
Only… wow, it’s running a lot slower then it normally does. Insanely slow.
You click onton Monitoring > Active Alerts – and then wait. And wait some more. As our once friendly green progress bar seems to start taunting you. So you lock the desktop and go chat up that new girl they hired. Wow, she’s pretty amazing right? Funny and smart as a whip, too.
Feeling happy and content after working your suave IT skills on her, you literally float back to your desk and unlock your desktop. Wasn’t there something bothering you before? Oh well, must have not been all that important. You peek up from your cube and catch a glimpse at her, then move those eyes down and see your still open Operations Console. The evil green bar still chugging away. But then you also see why…
And your nemesis, the green progress bar, it still keeps going. That number is rising faster then your blood pressure right now.
Must be a bug, eh? Ok, well, we’ll just check it via SQL to be sure – so you open SQL Studio and run
|
1 |
Select Count(*) from Alert |
And then your informed, without any gentleness of a WWII nurse as depicted in the movies, that you have a lot of open alerts.
Wow! You better fix this!
And you better do it on the RMS, because it’s taking forever from your desktop.
You already have a general idea of which rule did it – that active alerts panel should be filled with it. So your first stop is to get back to authoring panel and either disable that rule or setup some proper alert suppression. Then we just have to deal with cleanup.
You turn, as always, to our friend PowerShell to help us out. Surely the easiest and most obvious solution to this problem is to run
|
1 2 3 4 5 |
$alerts = Get-Alert | Where-Object {$_.Name -match "MyRule"} ForEach($alert in $alerts) { $alert.ResolutionState = 255 # Close the alert $alert.Update("") } |
Then just wait for the nightly alert grooming to happen to nudge it along with a SQL exec p_AlertGrooming
Only, when you try to do it, you get an OutOfMemory exception.
Now what to do?! The console is crippling slow – if you had to close the alerts that way your company would have gone bankrupt during the Dot Com Re-Burst of 2799! And when you try with PowerShell, you’ve run out of memory!
That’s where I was, until I talked to an unnamed friend((If you want to be named, just let me know. Better to err on the side of caution and all that)) from MS that really helped me out. That, combined with hindsight, allows me help you out as well!
How To Clean Up an Alert Storm
- Try the console. We’re going to assume it’s running slower than <Insert joke about large celebrities in the 1980s doing something they’re known for>, so we’ll move on.
- Try the Command Shell. $alerts = Get-Alert |? {($_.ResolutionState -eq 255) -or ($_.Name -match “Rule name if you know the naughty one”)} – Running out of memory still?
- Try the same command, only instead of piping it to Where-Object, use the builtin filter object.
$alerts = Get-Alert -criteria ‘WHERE ResolutionState = 0 AND Name LIKE ”%Rule Name%”’ - Still OOM? Try running both of those commands on the RMS, or another management server. Pick one with the most amount of memory, and hope for the best.
- Still receiving Out of Memory exceptions? Let’s stop using the OS to manage our memory. Open RegEdit and navigate to HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Config Service – change the value of Should Manage Memory from False to True. Stop and restart the config service.
- Now try your command again. It will take some time, but it will complete. Now lets try running the whole script to fix this:
12345$alerts = Get-Alert | Where-Object {($_.ResolutionState -eq 255) -and ($_.Name -match "MyRulename")}ForEach($alert in $alerts) {$alert.ResolutionState = 255$alert.Update("")}
(Alternately, you can use the Resolve-Alert cmdLet, but from testing it’s not quite fast enough to keep up with the next step) - Now when you ran step 6, it probably gave you a lot of errors when attempting to update the alert. That’s because there’s a small window of freshness to your alert object, and if you don’t update it within that window it becomes stale and unable to be used. To fix that, change the ForEach to look like this:
ForEach($alert in $alerts) {
$freshAlert = Get-Alert $alert.id
$freshAlert.ResolutionState = 255
$freshAlert.Update(“”)
}
That will grab a fresh version of that alert and update it. - But what if you have thousands upon thousands of alerts? The above solutions could conceivably take days to run. Don’t worry, there’s a way around that, too.
Before I show you, please be noted that this METHOD IS NOT SUPPORTED BY MICROSOFT and use of this method could possibly BLACKLIST YOUR OpsMgr INSTALL. It is the answer given out occasionally though, much to the dismay of the product group, so use that information how you’d like. - Connect to your operations manager database and run the following update. This one updates every rule, but you could narrow it down with an additional AND WHERE RuleName = “My Rule Name”
123Update AlertSet ResolutionState = 255Where ResolutionState = 0 and TimeResolved is Null - When that’s completed, you’ll need to update the TimeResolved via:
123Update AlertSet TimeResolved = '20-06-20 00:00:00.000'Where ResolutionState = 255 and TimeResolved is Null
< Make TimeResolved be some day in the past so it will groom them out. - Either wait overnight until the grooming jobs kick off or run
1Exec p_AlertGrooming - You’re done. Now don’t do it again!
[print_link]





Thanks for posting this, handy escape route if the console’s throwing a wobbly.
I notice that “SELECT Count(*) FROM Alert” you suggested here shows all alerts that haven’t been groomed regardless of open/closed state. The result of that query came out around the 6 million alert mark for me (99.99% were closed).
I’ve manually kicked off the grooming stored proc to see whether it permanently cuts the numbers there and something just went awry in the nightly cleanup.
To figure out what to tune I’ve ended up running “select alertname, count(*)as x from alert(nolock) group by alertname order by x” to locate the worst offenders.
Ah, good point AJ!
If you want the same alert information in a more ‘OpsMgr MS acceptable’ method, you can also run the “Most Common Alerts” report under the Reporting tab.
[...] It’s bound to happen: How to handle alert storms [...]
[...] beta, so Jeremy would love it if you all could test it out for him. This is a perfect solution to handle those pesky alert storms and another tool in the [...]