Alerting, Escalation and Event Log Management

Alerting, Escalation and Event Log Management

NetCrunch can act as a log server for external event sources. It stores them in the NetCrunch Event Log database and performs defined alert actions (i.e. notifications) in response to alerts.

Alert Sources

Performance Metric Triggers

NetCrunch can track thousands of performance metrics. Regardless of the origin of the metric, users can always use the same set of conditions to trigger alerts on actual or average metric values.

Besides setting simple thresholds, NetCrunch offers more advanced triggers including Baseline Triggers which compare actual data to baseline data collected for each hour and each day of the week.

Another useful trigger is the State Trigger, which allows you to track changes of discrete values (for example a change in value from 0 to 1). This is a situation where the counter represents the status of a service or device.

Available Trigger Types:

  • Threshold
  • Deviation Threshold
  • Baseline Threshold
  • State Trigger
  • Flat Value
  • Value Missing/Exists
  • Delta
  • Range

Event Triggers

Status Alerts

NetCrunch tracks the status of many monitored objects such as nodes, interfaces, services, windows services and more. These alerts are automatically correlated.

Sensors

NetCrunch uses a sensor for more complex monitoring tasks like monitoring file content, emails, web pages and checking HTTP responses, database queries, WMI and much more.

Windows Event Logs

NetCrunch can remotely gather, filter and analyze event log data from multiple Windows machines.

It allows you to define alert filters to convert event log events into NetCrunch alerts. Additionally, the program groups events if the same event is generated within several seconds. This grouping protects the system from alert floods.

@@event-log-query.png Windows Event Log Query Builder

Syslog, SNMP Traps & Text Logs

NetCrunch receives SNMPv1, SNMPv2c, and SNMPv3 traps. It can also forward all received traps to another SNMP manager.

NetCrunch can work as the syslog server. You can define filters for incoming alerts so you can assign proper actions for each message.

Web Messages

NetCrunch can receive and filter messages (events) send by simple HTTP REST API. The API is simple, and users can use POST and GET HTTP methods.

Alerts by Example

All Incoming traps and syslog message (even from nodes not being monitored in the Atlas) are visible in the External Events window. With a single click, users can convert them into alerts (node will be added to Atlas if necessary). It means NetCrunch allows you to define alerts for traps "by example."

Monitoring Text Logs

The NetCrunch file sensor can monitor text log files and can be used to monitor Linux files using FTP/s or HTTP/s, Windows/SMB and SSH/Bash. The sensor can process logs remotely without downloading them (Windows/SMB and SSH). NetCrunch provides parsers for common log formats out-of-the-box and allows you to write own parsing expressions using various methods (regexp, Javascript).

External Data

NetCrunch offers several ways of delivering data into NetCrunch. The essential data are performance counters, and status values representing external object state and performance. NetCrunch provides triggers to create alerts on these values.

Sending Data to Netcrunch

Alert Processing

Pending Alert Correlation

All internal alerts are automatically correlated, so NetCrunch knows when an alert begins and when it is finished (closed).

External alerts (like syslog, SNMP traps, Windows Events) can be correlated by adding closing events to the alert definition. This correlation allows you to focus only on unresolved issues and since events can execute actions when closed, it allows for simple integration with external systems (helpdesk).

@@3pending-alerts.png Pending Alerts View

Advanced Correlation

NetCrunch contains a global Monitoring Pack with correlation events allowing you to correlate events from multiple nodes. This type of correlation can be helpful when you want to define an alert only if alternate resources have failed (redundant connections).

Alerts can be triggered when all events are pending (all events must have pending correlation), or by defining a time frame in which they have to occur. These correlated alerts can be for any events previously defined on any node in the Atlas.

Conditional Alerts

NetCrunch allows you to define additional conditions for each defined alert, regardless if it is a node status, an event log alert or SNMP trap. These conditions allow you to trigger an action even if an event has not been triggered. For example, if there is no log entry confirming an operation (i.e., backup). Also, NetCrunch can receive heartbeat events and notify if one is missing. Other conditions allow you to suppress alert execution for some time (as alert won't be triggered, close actions also won't execute).

The event happened at least after The event happened more than

Available conditions

  • On event condition (default)
  • When the event happened at least after
  • When the event happened more than...
  • Only if the time between
  • Only if the time not between
  • When the event not happened in the time between
  • When the event not happened after
  • When the event pending for more than...

NetCrunch supports alerting rules ranging from the simple time range rules to complex schemes.

@@time-range-scheme.png Complex Time Range Scheme

Alerting Actions

Actions

As a response to an event, NetCrunch can execute a sequence of actions. Actions can also be executed when alert ends (on close). NetCrunch contains various actions including Notifications, Logging, Control Actions and Remote Scripts.

Notifications are very flexible and can be controlled by user profiles and groups. Additionally, they can be combined with a node group (atlas view) membership, so it's possible to send notifications to different groups based on network node location or some other relationship.

Predefined Actions

  • Basic Actions: Play Sound, Display Desktop Notification WIndow, Add Traceroute to Alert Massage, Add Network Service Status to Message, Notify user or group, email, SMS Text Message (via email), SMS Text Message via Mobile Phone
  • Computer Control Actions: Run Windows Program, Run Windows Script, Run SSH Script, Restart Computer, ShutDown Computer, Set SNMP Variable, Terminate Windows Process, Control Window Service, Wake on LAN
  • NetCrunch Control Actions: Change Node Monitoring State, Modify Node Issue List, Set Event Arrived Issue, Clear Event Arrived Issue
  • Local Logging Actions: Write to File, Write to Windows Event Log, Write to Unique File,
  • Remote Logging Actions: Send SNMP Trap, Send Syslog Message, Trigger WebHook
  • Linux Remote Scripts: Shutdown, Reboot, Restart SNMP Daemon, Mount CD-ROM, Dismount CD-ROM
  • Windows: Run Disk Defragmenter, Start SNMP Service, Stop SNMP Service

Alerting Actions

Action Escalation & Conditional Execution

Actions can be executed immediately or with a delay (if the alert is not finished), and the last action can be repeated. Additionally, you can specify actions to be executed automatically when an alert is closed.

For example, you can decide to send a notification to some person and then, after some time, execute a server restart operation.

@@sample-script.png Sample Alerting Script

The script above executes only notifications for critical alerts and restarts the node causing this event if this is a Windows Server node.

Event Log Views

Pending Alerts

This separate view shows only current alerts instead of forcing administrators to browse an event log which offers a history of all alerts.

Event log views can be synchronized with the Atlas Tree Window. It means that when you click on a specific view such as a location or node group (i.e. servers), pending alerts are automatically displayed for this view.

Summary

The Summary view shows alert statistics for a given view. The statistics are grouped by monitoring category and also by custom views. This dashboard gives you a quick overview of what types of alerts happened in a given time range.

@@event-summary.png Event Summary for the Last 24h

Custom Event Log Views

NetCrunch offers many predefined event log views and allows you to create custom views using an intuitive query builder. Views can be saved and used for any node group in the Atlas.

@@custom-view.png Query Builder and Date Range Selection

Event Details

For each event in the event log, NetCrunch offers a Details view containing all alert details and parameters. This window shows all executed actions and also the event that closed a given alert.

If the alert has been triggered on a performance counter value, it displays a chart showing values at the time of the alert.

@@event-details.png Event Details