Keeping a Watchful Eye: Simple Ways to Gather Intelligence to Support Your Broadband Network
May 20, 2020
Clearcable is responsible for monitoring and maintaining a wide range of ISP and telecommunication networks throughout North America and beyond. To accomplish this, we leverage a wide range of network management protocols and open-source tools to ensure the stability and reliability of networks.
Simple Network Management Protocol (SNMP) is used to monitor individual network devices such as switches, routers, and Cable Modem Termination Systems (CMTS). SNMP is based on a pull method in that a monitoring system will contact individual devices and pull the information it needs. This is typically done in five-minute intervals so as not to overwhelm resources on the device being monitored. One of the draw backs of this method is that during the five-minute interval a lot can change. For example, the bandwidth utilization on a port could spike for a period of one minute and return to normal values in between the polling interval. The SNMP monitor, in this case, would have no record of this occurring.
An alternative to SNMP that has recently been gaining traction is Streaming Telemetry. Streaming Telemetry is based on a push model with subscriptions to individual packages to be monitored on a given device. In other words, if you are interested in collecting interface statistics on a router you would subscribe to this package. This data can be streamed to a collector in near real time (if desired) which provides the benefits listed below;
- Real time data; bandwidth utilization can be monitored and alerted on in near real time which drastically improves response times
- Less resource intensive; SNMP queries can be very resource demanding. With Streaming Telemetry, the data is streamed in real time using a push model which is much less resource intensive and more scalable
- Application-friendly; having real time data available on specific devices allows for several possibilities that you cannot get with SNMP. For example, network automation can be tied into Streaming Telemetry so that if a port begins dropping packets this information is caught by the monitor server and network automation tools reconfigure the device to re-route around the problem port.
- Configuration changes can be pushed to the monitored device using the Streaming Telemetry model; this creates a consistency between the configuration changes being applied and the data being monitored all within the same ecosystem.
The image below shows a rudimentary diagram of how automation can be used by receiving Telemetry data and then acting on that data by pushing out a configuration change.
Regardless of what protocol is used to obtain this data theresources monitored could include;
- Device status (up/down)
- Port status (up/down)
- Port error rate
- BGP peer status (up/down)
- CPU utilization
- Memory utilization
- Power supply status
- Port bandwidth (measured in bits per second)
In addition to SNMP and Streaming Telemetry another protocol used to monitor network devices is IPFIX, or NetFlow. Each device can be configured to maintain individual traffic flow records and export these to an associated collection server. This method is similar to the Streaming Telemetry push model however the data that is sent is very different. There are several alternatives to IPFIX which are vendor specific. One of the more popular ones today is sFlow.
Flow records are typically used to provide insight intotraffic patterns across a network. For example, these records can be used todetect Denial of Service attacks (DoS) which typically exhibit the samepatterns and can be detected and responded to by specialized software.
Another use case is the tracking of overall bandwidth and where traffic is coming from and going to on a network. This information can be collected and presented using graphs to aid in making policy decisions. A real-world example would be a list automatically generated of the top 10 AS numbers (autonomous system) that traverse an ISP’s network to reach its customers. This information can then be used to make policy-based decisions such as forming new direct peering relationships where available in order to increase capacity and reliability on the network.
Finally, network devices maintain local logs of all events that occur on the device. These logs can be exported to servers configured to listen to them. The centralized logging servers are an invaluable resource when troubleshooting networks as they provide a single point to analyze activity across the network and correlate time stamps across multiple devices from a single screen.
These are just a few of the ways Clearcable monitors its customer’s networks today. By leveraging SNMP, Streaming Telemetry, Flow records, and centralized logging, potential issues are identified before they become larger outages.