A new article in the September issue of German magazine LANLine (“Automation creates productivity”) summarizes typical challenges and problems in network monitoring very well and is worth reading. I would like to briefly discuss some of the problems addressed and how our product SIGNL4 was developed as a solution for exactly these problems.
Problem 1: Slow response
Of course, it is important that critical alarms are processed promptly. Even small problems can quickly lead to major failures. Email and dashboards are often the means of choice, as they are brought along by the monitoring tool. However, one has to be aware of the limits of these two methods and SIGNL4 can help here:
- Location-independent and attention-grabbing alerting: Dashboards require permanent attention and/or appropriate access, e.g. a PC/monitor. The employees’ freedom of movement is restricted. Alarming via smartphone, call and SMS by SIGNL4 provides higher flexibility and mobility.
- Using email for alerting has several disadvantages: Its signaling is often not very concise. Important alarms have the same position in the mailbox as an Amazon notification. Tracking (alarm acknowledged, resolved or not) is difficult to cumbersome. And the responsibility for an alarm (takeover by a colleague) is also difficult. SIGNL4 provides a remedy here. Alarms are acknowledged and this is visible to the whole team in real-time. The signaling can be adjusted to the severity and other parameters via categories. The status of an alarm is directly visible.
Problem 2: Too many alerts
- Alarms can be filtered. Categories in SIGNL4 represent a whitelist based on a text search. If keywords are not recognized, an alarm is not transmitted or triggered. Alternatively, a blacklist can be built, in which all “unwanted” keywords are collected in an alarm category and then the visibility of this category is switched off.
- SIGNL4 can also use the categories to control if and how alarms are signaled. So alarms can be visible but there is no notification by push, SMS or call. For certain types of alarms, a specific sound (and color and icon) can also be defined, so that relevance or criticality can be quickly identified. This helps enormously in achieving a targeted and fast response.
- Categories can also be used to deliver alerts exclusively to specific team members according to responsibilities and skills (see screenshot). This reduces the alarm load on colleagues.
Problem 3: Critical alerts at night
Of course, many important alarms do not occur during the day or during normal working hours. How do you make sure that the alarms are not overlooked without burdening employees unnecessarily? Sending them to a whole team during nighttime is not a sensible way to do this. Operating a 24/7 NOC is expensive and reserved for large companies. On-call duty is an option but needs a tool like SIGNL4.
- SIGNL4 was developed exactly for IT on-call duty. The integrated, very convenient planning of duty times in the browser allows automatic routing to the respective active on-call service provider (or several) without bothering other employees in their spare time.
- Alarming via multiple channels (push, SMS and call) as well as repeated signaling until acknowledged ensures high reliability when reacting to an important alarm.
- The escalation function of SIGNL4 also ensures that alarms are received after all if the on-call person missed the notification.
Problem 4: Only temporary outages
Particularly in the network environment, there are often short-term failures that can be eliminated after a few seconds. Such alarms can cause a lot of work, because it takes a lot of effort to track the validity of such an alarm.
- SIGNL4 offers an elegant solution for this. The REST-API, the email interfaces and ready-to-use 2-way connectors (e.g. for Zabbix) offer the possibility to close alarms from the triggering monitoring system in SIGNL4. This means that if an alarm reset is performed in the monitoring system, the alarm is set to closed in SIGNL4 and the signaling is stopped. At the same time the alarm remains visible in the log for later analysis.
Here is a selection of videos showcasing above mention features: