Best Practices for Monitoring Switches (Part I)
Discover essential insights for maintaining the health and performance of your network switches, exploring best practices for monitoring and diagnosing issues.
Network switches are the cornerstone of modern organizations' connectivity, ensuring seamless data flow between devices. In this comprehensive guide, we'll delve into best practices for monitoring switches, covering various aspects such as switch health, technologies employed, the distinction between performance health and traffic monitoring, and more.
Introduction to Switch Health Monitoring
Monitoring Technologies
Monitoring switch performance and creating comprehensive visualizations of network infrastructure rely on various technologies and data sources, each serving specific purposes and providing critical information:
- SNMP (Simple Network Management Protocol). which is a foundational protocol for collecting real-time data from network devices, including switches, enabling performance monitoring, including traffic statistics on bandwidth utilization, packet loss rates, and network traffic patterns. This information helps assess switch performance. SNMP reports on the operational status of switch ports, hardware components, and their metrics, including error rates and resource utilization. It provides insights into the overall health of the switch.
- Syslog Messages serve as event and error logs generated by network devices, offering a historical perspective on network events, errors, and issues, including configuration changes, switch port status changes, or network topology updates. They may also report on errors, warnings, or critical issues within the switch or the network, providing context for identifying and addressing problems.
- Protocols such as ICMP, SNMP, and LLDP are employed to identify and document network devices and their connections, including IP addresses, MAC addresses, and more. They facilitate the creation of accurate network topology maps, VLANs,interface views, and routing maps that help you diagnose and locate network problems.

These technologies and data sources work in tandem to provide network administrators with the insights required to ensure efficient switch performance and network functionality.
Performance Health vs. Traffic Monitoring
Distinguishing between performance health and traffic management is essential for a comprehensive understanding of switch operations:
Performance Health Monitoring: This focuses on the physical and operational well-being of the switch itself, ensuring it functions optimally. Performance health can be influenced by hardware failures, resource exhaustion, or configuration errors.
Traffic Monitoring: Traffic management involves assessing how effectively the switch handles network traffic. It includes considerations like Quality of Service (QoS) settings, traffic analysis, and routing efficiency. Inefficient traffic management can lead to performance degradation, including packet loss and latency.
Aspects to Monitor
When it comes to monitoring switch health, a thorough understanding of the following aspects is crucial:
Hardware Health issues pertain to the physical components and environmental conditions of switches. Monitoring hardware health includes assessing components like power supplies, temperature, fans, and component failures. Changes in hardware health could be indicative of environmental issues, excessive heat, or hardware wear and tear.
Symptoms may include a sudden increase in temperature readings beyond the safe range, a sharp spike in voltage fluctuations, frequent power supply failures, or fan errors reported in syslog messages.

Performance Status: Performance issues primarily refer to problems related to the operational efficiency and resource utilization of network switches. These issues often arise due to factors such as high CPU or memory utilization, excessive network traffic, or misconfigurations. Switch performance metrics encompass a wide range of data points. These include CPU and memory utilization, port statistics (e.g., errors, collisions), and bandwidth usage.
Symptoms may include rapidly increasing CPU and memory utilization metrics, and high temperatures reported by hardware sensors, alerts for resource exhaustion, syslog messages indicating system performance issues.
Firmware and Configuration Changes: Keeping an eye on firmware versions and configuration changes is essential. Software and firmware issues can introduce instability into network switches, leading to various symptoms and problems that affect network performance and reliability.
Symptoms may include log entries indicating configuration changes not authorized by administrators that may cause network instability, security breaches, or connectivity issues. Incorrect changes to switch configurations can disrupt network operations and security.
Diagnosing and Resolving Problems in a Switch: An Example
- Imagine you receive an alert about high-temperature readings for one of your switches. This raises a red flag—there might be a hardware issue.
- Quickly log into the switch and navigate to the hardware status section. Confirm the high-temperature readings and fan errors. It's evident that the switch is struggling to keep its temperature in check.
- For a thorough investigation, physically check the switch and its surroundings. Ensure proper ventilation and cooling mechanisms are in place. If needed, replace malfunctioning fans or consider relocating the switch to a cooler area.
- To resolve the hardware issue, address the root cause. If it's a fan malfunction, replace the fan. If it's an overheating problem due to poor ventilation, optimize the switch's placement for better airflow.
- Follow by delving into the syslog messages of this switch—the network's way of sharing events and errors. Look for any entries related to hardware or temperature alerts recorded before the critical alert. As you examine the syslog messages, you may come across earlier logs reporting increasing temperatures and fan deprecation. It may become clear that the switch has been experiencing hardware strain for some time.
- Based on this knowledge you may create or adjust existing warning alerts to be informed when similar symptoms are observed again. In this way, you'll be able to proactively prevent such problems in the future.
Remember, this process of identifying hardware issues is applicable to all aspects of switch monitoring. Identify, investigate, resolve, and keep your switches in top-notch condition.
Stay tuned for the second part of this article, where we will delve into tracking traffic on switches and understanding the difference between traffic and flows monitoring.
- [04.03.2021] Troubleshooting SNMPv3 monitoring issues
SNMPv3 protocol is the latest and most secure version. Encryption and authentication may take their toll on the v3 traffic so in many cases admins decide to stay on v2. Yet sometimes you need to go for it. Learn what to check if you encounter issues with SNMPv3 monitoring.
- [11.08.2020]How to configure SNMP traps for ESXi/ESX 3.5, 4.x, and ESXi 5.x hosts
Monitoring the health of older ESX/ESXi systems can be implemented by enabling SNMP and setting up SNMP traps to track performance events. Learn how to set it up.
- [26.03.2020]Enabling SNMP on a Windows machine
Despite the fact, that Microsoft depreciated SNMP for the Windows Server 2012 onwards, it is sometimes necessary to enable SNMP in this system. Below you will find a simple walkthrough how to do it.
- [05.04.2017] Using NetCrunch to track Port Security status of Cisco switches.
Cisco port security is a great feature to make your network safer. Learn how to configure NetCrunch to display the status of Cisco Port Security on the switch interfaces.
- [15.12.2019] How To Monitor Any Device With SNMP
Here's an SNMP primer for any junior person on your IT team - they may actually be younger than the protocol itself. After all these years, for most devices that you need to monitor, starting with SNMP before you try anything else is a safe bet.
- [08.02.2019]SNMP monitoring features in NetCrunch
There is a lot of the ways of how we can monitor the condition, performance and availability of the devices with available SNMP service running on them. Learn how to gather the counters and states from various devices with the tools available in the NetCrunch.
- [24.01.2019]Monitoring complex instances of the SNMP counters
Sometimes the SNMP counter identifier is placed in a different table than a counter. NetCrunch 10.5 introduced a new way of monitoring such complex instances.
- [17.08.2018]Printer monitoring in NetCrunch
Use NetCrunch printer sensor to get information about the status of the printer and the ink level.
- [25.06.2018]How to start monitoring of new SNMP device with NetCrunch
Is your device monitorable? Read the article to learn how to approach configuring monitoring for a new device.
- [28.08.2017] Network interface monitoring in NetCrunch
There is more to Interface monitoring than just IN/OUT traffic examination. This article describes how NetCrunch handles monitoring of High Speed interfaces using 64-bit counters, and how to understand the speed of the interfaces.
- [07.04.2016] Optimizing SNMP Monitoring in NetCrunch
This article will explain reasons for most common SNMP monitoring problems, describing ways to fine-tune SNMP monitoring settings in NetCrunch. Learn how to monitor without stressing your SNMP device.
- [21.04.2016] Monitoring External Events in NetCrunch
Using multiple tools to catch every SNMP trap or SYSLOG message might be hard. NetCrunch simplifies this task with the External Events window and lets you fine-tune your log and SNMP monitoring: