A lot of IT managers would like to keep closer tabs on their important systems, but it often takes a crisis to justify the time and expense required.
Jim Hirschauer, technology architecture manager and technical expert for Wachovia’s Corporate and Investment Banking Group, recalls one poignant example of this. “It was a foreign trading application, and we were having intermittent performance issues, which was causing some significant problems for some of our customers,” he says. “They started complaining about delays, and with the speed of that market, delays aren’t good. Those customers can go away very easily, and a couple of them did.”
Hirschauer couldn’t estimate the amount of revenue Wachovia lost due to the defections, some of which were only temporary. But in a business as volatile as investment banking, a single hot trading day can have a significant impact on revenue, and the loss of one large customer can keep a bank out of the rush.
The problem, Hirschauer says, was that Wachovia’s battery of systems-management and monitoring tools reacted too slowly. Like most systems-management software, these tools required IT to set performance thresholds and sent an alarm if they were exceeded.
If capacity on one application server reached 30 percent, for example, or available storage dipped below 50 percent, the systems-management agent might send an alert to a central console.
“The problem with a static threshold is that you tend to set it high to avoid having to react to false positives,” Hirschauer says. “But if you set it too high, you’re already pretty deep in the weeds by the time you know about it. And your values change as your infrastructure changes, so it becomes an administrative nightmare.”
Wachovia, which was in the midst of a larger project to improve systems monitoring, needed a better way to tell when performance was just beginning to affect customers?not just to know when the system was about to crash.
“We were basically in the business of availability monitoring,” Hirschauer says. “We were really good at knowing when a file system would fill up or if a server was up or down. What we didn’t do a good job of traditionally was monitoring of the component level and performance through the various application tiers.”
Symantec’s I3 deep-dive analysis tools helped troubleshoot that particular system, but they couldn’t get Wachovia’s IT crew out of the firefighting business.
The overall solution was a better-instrumented and better-integrated set of systems-management tools from Symantec, Hewlett Packard and others.
The technology that pushed Wachovia’s response far enough upstream to head off trouble before it arrived was a tool that warns IT when a system or application has stopped behaving normally?whether that means a sudden slowdown, acceleration or complete lack of response.