Reducing the ticket volume in your helpdesk system

When integrating the network monitoring system with an external helpdesk, you do not wish to be flooded with tickets. Learn how you can use alert escalation scripts, monitoring dependency, conditional alerts, and other NetCrunch features to convert only persistent problems into tickets.

The concept is described based on NetCrunch integrated with JIRA cloud installation but can be applied respectively to other integrations that are included in NetCrunch.

Enabling integration with JIRA

NetCrunch has built-in, ready-to-use integrations with several ticketing systems. One of the most popular is JIRA. You can easily use NetCrunch to redirect all or selected events to JIRA.

Let's quickly recap how to integrate NetCrunch with JIRA. First, you have to create an integration profile:

Use Top Menu Monitoring Integration Profiles
Select JIRA from the list
Fill in the required fields
Save settings

In JIRA you can create your own ticket category to be used for tickets coming from NetCrunch. In this example we will convert switch-related problems into tickets, so we have decided to create a Network admins category in JIRA. Problems related to switches that become tickets in JIRA will be automatically assigned this category.

Defining alert escalation script for events to be forwarded to JIRA

After defining the integration profile, you can create an Alert Escalation Script that will describe the set of actions related to creating tickets in JIRA. The alert escalation script will be applied to events that you want to be forwarded and converted into tickets in JIRA:

Use Top Menu Monitoring Add Alerting Script
Click Add Alerting Script
Enter Script name, for example, Ticket in JIRA
Click + Add button
Select Action to Run Immediately
From the left pane, select Integrations
Double-click on JIRA Service Desk Ticket
Fill in all necessary fields
Save settings

Now you can add the created alert escalation script to the selected alerts - you can do it of course on different granularity levels: edit alerts for the specific node or group of nodes, for the specific view, or for one or more monitoring packs.

For example, if you want to add an alert escalation script to CPU alert for all Cisco Switches, you should do it on the monitoring pack level:

Use Top Menu Monitoring Monitoring Packs and Policies
Select Operating Systems Cisco CPU
Right-click on e.g. % Processor Utilization > 90%
Select Assign Predefined Alerting Script
Select your script from the list
Save Settings

From now on, all such alerts will be forwarded to JIRA.

Keeping number of tickets low - forwarding only critical, persisting problems

NetCrunch as an intelligent monitoring system is equipped with several algorithms that prevent alert floods, and consequently, floods of NetCrunch tickets into JIRA. These include:

Monitoring dependencies

When one of the switches fails. you probably do not care that NetCrunch does not have access to nodes behind a given switch (it can't monitor them). For the administrator, the most important information is that the switch is down. Similarly, when the ESXi host crashes. it is obvious that failing host takes down with it all 100 virtual machines hosted on it, you do not need 100 alerts about it.

Some teams prioritize their jobs based on a sheer number of alerts received that are related to the problem, but... you are smarter than that with NetCrunch. To drill-down and isolate the source of the problem, NetCrunch is equipped with the Monitoring Dependencies and Event suppression features. It silences the secondary alerts and singles out the one coming from the source of the problem, in the result instantly guiding you to the place where the problem started.

Option to disregard temporary (short) peaks

Do you want an alert to be generated for every exceeded 80% CPU usage threshold? Sometimes its usage peaks, only to return to normal levels after a few seconds. Simple threshold-based alerts would result in hundreds of alerts about such peaks a day. Instead, you can decide to be notified only about a CPU that stays high for longer and may lead to permanent performance degradation. It may signal that a process has crashed or someone may be mining cryptocurrencies on company infrastructure. NetCrunch includes ready-to-use alert conditions that help you easily narrow down the alert definition to reduce alert noise related to temporary spikes. One of them is counting the average for the selected number of minutes or samples. But there are more.

Decreasing alert sensitivity

Another smart option to explore is the time range restriction. By default, all nodes are monitored all the time. However, some nodes may have maintenance hours, e.g. once a week on Sundays, between 1 am and 4 am. You know the node may shut down during these hours. So why unnecessarily inform about such events? In NetCrunch you can set an exclusion for these hours.

And what if we want to forward the ticket to JIRA only when the node remains down for at least 30 minutes or was down 5 times within the last 4 hours? For this purpose, you can use the Trigger Alerting Action On settings:

Add alert you want (Node is DOWN)
Right-click on the alert and select Modify Event Rule
Navigate to Trigger alerting Actions On and expand the dropdown
Select any condition you want:
Save settings and assign alert escalation script

The last example allows generating an alert only if specific conditions are present. E.g. in the case of the HA cluster, usually when 1 out of 3 hosts is DOWN we have still some redundancy. However, if 2/3 hosts are down situation is more serious and a relevant alert should be generated. Here comes the 'Correlations' feature to the rescue:

Open the Monitoring Packs and Policies window
Find and open Correlations monitoring pack
From the Trigger alert when select All alerts are active
Click Add Alert button
Select the node and alert to be correlated (Node is DOWN)
Perform steps 4-5 for the second and all other nodes in the HA cluster.
Save settings

Automatic remote remediation actions executed in response to an event

An intelligent monitoring system attempts to solve the problem itself before notifying you or creating tickets in the ticketing system. In NetCrunch you can select from a list of several predefined basic, control, logging, or integrations' actions. One of them even allows running your own script to perform any commands remotely on the node that had generated an event.

Imagine a situation where some essential, but non-critical, node is down. Wouldn't it be nice if the monitoring program tried to turn it on before it passed the 'Node down' alert to the ticketing system? Why bother a busy team when you can first attempt to fix the problem automatically and alert them only if automatic remediation fails? Let's go back to Alert Escalation Scripts at this point:

Use Top Menu Monitoring Alert Escalation Scripts
Click Add Alerting Script or edit an existing one.
(Enter Script name if creating a new one)
Click + Add button

At this point, you can define a more advanced condition, so that one action is performed immediately (attempting to turn on the node), and the other one is triggered, for example, after 5 minutes (unless the node goes up):

Select Action to Run Immediately
From the left pane, please select Control
Double-click on Wake on LAN
Select Node to wake up = <Node Causing Event>
Click OK button
Click + Add button
Select Action to Run After
From the left pane, select Integrations
Double-click on JIRA Service Desk Ticket
In the Run after (min) field (top-right corner) enter e.g. 5
Fill necessary fields
Save settings

In this case, when the node goes down, the program will automatically try to restart the node. If it fails and 5 minutes later the alert is still active (i.e. the node is still down), a ticket will be created. Using NetCrunch in this way ensures that you are informed only about events that require intervention and couldn't be repaired automatically.

All the above methods allow you to significantly reduce the number of NetCrunch-generated tickets in your ticketing system, thus saving time, money, and eye fatigue for administrators.