alexa Alerting, Escalation and Event Log Management in NetCrunch

Alerting, Escalation and Event Log Management in NetCrunch

NetCrunch can act as a log server for external event sources. It stores them in the NetCrunch Event Log database and performs defined alert actions (i.e. notifications) in response to alerts.

Alert Sources

Performance Metric Triggers

NetCrunch can track thousands of performance metrics. Regardless of the origin of the metric, users can always use the same set of conditions to trigger alerts on actual or average metric values.

Besides setting simple thresholds, NetCrunch offers more advanced triggers including Baseline Triggers which compare actual data to baseline data collected for each hour and each day of the week.

Another useful trigger is the State Trigger, which allows you to track changes of discrete values (for example a change in value from 0 to 1). This is a situation where the counter represents the status of a service or device.

Available Trigger Types:

  • Threshold
  • Deviation Threshold
  • Baseline Threshold
  • State Trigger
  • Flat Value
  • Value Missing/Exists
  • Delta
  • Range

Event Triggers

Status Alerts

NetCrunch tracks the status of many monitored objects like: nodes, interfaces, services, windows services and more. These alerts are automatically correlated.

Sensors

NetCrunch uses a sensor for more complex monitoring tasks like monitoring file content, emails, web pages and checking HTTP responses.

Windows Event Logs

NetCrunch can remotely gather, filter and analyze event log data from multiple Windows machines.

It allows you to define alert filters to convert event log events into NetCrunch alerts. Additionally, the program groups events if the same event is generated within several seconds. This protects the system from alert floods.

@@event-log-query.png Windows Event Log Query Builder

Syslog, SNMP Traps & Text Logs

NetCrunch receives SNMPv1, SNMPv2 and SNMPv3 traps. It can also forward all received traps to another SNMP manager.

NetCrunch can work as syslog server. You can define filters for incoming alerts so you can assign proper actions for each message.

Web Messages

NetCrunch can receive and filter messages (events) send by simple HTTP REST API. The API is simple and users can use POST and GET HTTP methods.

Alerts by Example

All Incoming traps and syslog message (even from nodes not being monitored in the Atlas) are visible in the External Events window. With a single click, users can convert them into alerts (node will be added to Atlas if necessary). This means NetCrunch allows you to define alerts for traps "by example".

Monitoring Text Logs

The NetCrunch file sensor is able to monitor text log files, and can be used to monitor Linux files using FTP/s or HTTP/s.

External Data

NetCrunch offers several ways of delivering data into NetCrunch. This can be typical performance counters or status values representing some external object state. In both situations NetCrunch offers triggers to create alerts on these values.

Open Monitor

Alert Processing

Pending Alert Correlation

All internal alerts are automatically correlated, so NetCrunch knows when an alert begins and when it is finished (closed).

External alerts (like syslog, SNMP traps, Windows Events) can be correlated by adding closing events to the alert definition. This allows you to focus only on unresolved issues and since events can execute actions when closed, it allows for simple integration with external systems (helpdesk).

@@3pending-alerts.png Pending Alerts View

Advanced Correlation

NetCrunch (PremiumXE only) contains a global Monitoring Pack with correlation events allowing you to correlate events from multiple nodes. This can be helpful when you want to define an alert only if alternate resources have failed (redundant connections).

Alerts can be triggered when all events are in a pending state (all events must have pending correlation), or by defining a time frame in which they have to occur. These correlated alerts can be for any events previously defined on any node in the Atlas.

Conditional Alerts

NetCrunch allows you to define additional conditions for each defined alert, regardless if it is a node status, an event log alert or SNMP trap. These conditions allows you to trigger an action even if an event has not been triggered. For example, if there is no log entry confirming an operation (i.e. backup). Also, NetCrunch can receive heartbeat events and notify if one is missing. Other conditions allows you to suppress alert execution for some time (as alert won't be triggered, close actions also won't execute).

Available conditions

  • On event
  • if event happen after x time
  • if event happen more than x time
  • Only if time range
  • Only if time not in range
  • If event not happen in given time range
  • if event not happen after x time
  • if event is pending for more x

NetCrunch supports alerting rules ranging from simple time range rules to complex schemes.

@@time-range-scheme.png Complex Time Range Scheme

Alerting Actions

Actions

As a response to an event, NetCrunch can execute a sequence of actions. Actions can also be executed when alert ends (on close). NetCrunch contains various actions including: Notifications, Logging, Control Actions and Remote Scripts.

Notifications are very flexible and can be controlled by user profiles and groups. Additionally, they can be combined with node group (atlas view) membership, so it's possible to send notifications to different groups based on network node location or some other relationship.

Predefined Actions

  • Basic Actions: Play Sound, Display Desktop Notification WIndow, Add Traceroute to Alert Massage, Add Network Service Status to Message, Notify user of group, email, SMS Text Message (via email), SMS Text Message via Mobile Phone
  • Computer Control Actions: Run Windows Program, Run Windows Script, Run SSH Script, Restart Computer, ShutDown Computer, Set SNMP Variable, Terminate Windows Process, Control Window Service, Wake on LAN
  • NetCrunch Control Actions: Change Node Monitoring State, Modify Node Issue List, Set Event Arrived Issue, Clear Event Arrived Issue
  • Local Logging Actions: Write to File, Write to Windows Event Log, Write to Unique File,
  • Remote Logging Actions: Send SNMP Trap, Send Syslog Message, Trigger WebHook
  • Linux Remote Scripts: Shutdown, Reboot, Restart SNMP Daemon, Mount CD-ROM, Dismount CD-ROM
  • Windows: Run Disk Defragmenter, Start SNMP Service, Stop SNMP Service

Alerting Actions

Action Escalation & Conditional Execution

Actions can be executed immediately or with a delay (if the alert is not finished), and the last action can be repeated. Additionally, you can specify actions to be executed automatically when an alert is closed.

For example, you can decide to send a notification to some person and then, after some time, execute a server restart operation.

@@sample-script.png Sample Alerting Script

The script above executes only notifications for critical alerts and restarts the node causing this event if this is a Windows Server node.

Event Log Views

Pending Alerts

This separate view shows only current alerts instead of forcing administrators to browse an event log which offers a history of all alerts. Event log views can be synchronized with the Atlas Tree Window. This means that when you click on a specific view like a location or node group (i.e. servers), pending alerts are automatically displayed for this view.

Summary

The Summary view shows alert statistics for a given view. The statistics are grouped by monitoring category and also by custom views. This gives you a quick overview of what types of alerts happened in a given time range.

@@event-summary.png Event Summary for Last 24h

Custom Event Log Views

NetCrunch offers many predefined event log views and allows you to create custom views using an intuitive query builder. Views can be saved and used for any node group in the Atlas.

@@custom-view.png Query Builder and Date Range Selection

Event Details

For each event in the event log, NetCrunch offers a Details view containing all alert details and parameters. This window shows all executed actions and also the event that closed a given alert.

If the alert has been triggered on a performance counter value, it displays a chart showing values at the time of the alert.

@@event-details.png Event Details