It said that Network performance can be a beast but I think the IT team that is properly prepared, with the right tools, can prevail in even the most difficult battles. Whether you are evolving from a disparate set of free tools or replacing an existing monitoring toolset, you will want to focus on the core capabilities you need to tackle the scariest problems. Your team can turn into the knights in shining armor that save users and your company from the frustration and lost productivity caused by intermittent performance problems.
The Alert Storm Dragon
Without the proper tools, your monitoring environment can be plagued by alert storms. One port on a router or switch goes down making other devices invisible to your monitoring tool. This cascade of apparent failures makes it extremely difficult to separate real failures from false positives.
Alert storms delay fault isolation and resolution which puts a huge drag on performance, availability and user satisfaction.
Alert storms arise when your monitoring tool is not ‘dependency aware’. It fails to recognize the connections (dependencies) from one port to another. Monitoring tools that recognize dependencies will automatically suppress those alerts that are obviously generated based on these dependencies.
Slaying the Alert Storm Dragon
There are two ways a monitoring solution can address this issue- The first involves enabling the manual creation of dependencies. This can be a time-consuming approach which can become untenable in network environments that change frequently or involve large numbers of dependencies.
The second and more favourable approach is to automatically create dependencies on discovery. This requires a more sophisticated discovery tool but does not necessarily drive the up the price. A solution that’s dependency aware will solve all the monitoring problems faced. When monitoring system understands how a network is connected and the dependencies between devices on the network, it will put a halt to alarm storms.
The Angry Users Dragon
Every IT team experiences unplanned service interruptions. It could be the slowdown of a key application, an unexpected outage or even the result of a planned change. The measure of a good operations team, however, is how often they are reacting to versus proactively addressing performance issues. Users want IT teams that are already working on the problem when they call. The help desk that answers ‘yes we know and are already working on it’ instills confidence. End-users will remain satisfied longer into a prolonged issue as they understand some are more difficult to resolve than others. But, if you first learn there is a problem with a user’s complaint, it is more likely you will be perceived as taking too long to resolve the issue. The more often this is the case the lower the confidence in the IT team.
IT teams that experience alert storms frequently also suffer from alert fatigue. There are so many alarms going off they grow numb to the fact that some of them may be big problems. By assuring your network monitoring tool features automatic discovery and mapping that is dependency-aware, you better equip your team to spot issues before the users call.
The Angry Users dragon feeds on IT teams that are always in reactive mode. If these teams also suffer from disjointed troubleshooting tools – they consider them a delicacy and often relish in causing them immense agony. Angry Users dragons tend to multiply as troubleshooting times drag on without a resolution often to the point of the feeding frenzy.
Slaying the Angry Users Dragon
Sounds scary, right? Well, the good news is that it isn’t that hard or expensive to protect yourself from the Angry Users dragon. With the right tools in your arsenal, you will rarely even see them. This dragon is opportunistic and would rather feed on the weak than to take on a worthy adversary.
The favourite tools of expert Angry Users dragon slayers are proactive alerts, customizable dashboards and user-friendly drill-downs to device detail. Look for a monitoring tool that gives you the ability to drill-down to quickly pinpoint root causes. You’ll want historical dashboards to identify trends and intermittent performance problems. You’ll also need the ability to trigger scripts and embedded action to restart services and reboot network devices and services.
Basically, you want troubleshooting to be automated as much as possible and done the way you would do it.
The Lack of Visibility Dragon
Our final dragon is perhaps the deadliest of all. The Lack of Visibility dragon preys on IT teams that opted for ‘freeware’ and open source solutions to put together a hodge-podge of disparate tools. This dragon loves the confusion caused by attacking IT teams whose defences aren’t integrated. It delights in the pandemonium caused from everyone on the triage team having a different view of the problem.
These teams end up getting multiple reports on why performance is poor—each report from a different system, and more often than not each one contradicting the other. It’s no wonder finding root cause is so elusive. When they can’t come up with a single, accurate answer all your IT teams can agree on— the Lack of Visibility dragon rules.
Slaying the Lack of visibility Dragon
You should look for a monitoring tool that provides an integrated view of everything you need to manage. Whether that is switches and routers, virtual servers, wireless access devices, servers in the cloud or applications – you can’t troubleshoot effectively unless you have the ability to see things in context.
Make sure you have a monitoring environment that allows you to see everything and miss nothing. Don’t let a vendor force you to rely on partial solutions such as monitoring only a portion of the network.
The author is Senior Vice President, Ipswitch. Views are personal.If you have an interesting story to share, please send it to [email protected]