A lot of IT managers would like to keep closer tabs on their
important systems, but it often takes a crisis to justify the time and expense
required.
Jim Hirschauer, technology architecture manager and
technical expert for Wachovia’s Corporate and Investment Banking Group, recalls
one poignant example of this. "It was a foreign trading application, and
we were having intermittent performance issues, which was causing some
significant problems for some of our customers," he says. "They
started complaining about delays, and with the speed of that market, delays
aren't good. Those customers can go away very easily, and a couple of them
did."
Hirschauer couldn't estimate the amount of revenue Wachovia lost
due to the defections, some of which were only temporary. But in a business as
volatile as investment banking, a single hot trading day can have a significant
impact on revenue, and the loss of one large customer can keep a bank out of
the rush.
The problem, Hirschauer says, was that Wachovia's battery of
systems-management and monitoring tools reacted too slowly. Like most systems-management
software, these tools required IT to set performance thresholds and sent an
alarm if they were exceeded.
If capacity on one application server reached 30 percent, for
example, or available storage dipped below 50 percent, the systems-management agent
might send an alert to a central console.
"The problem with a static threshold is that you tend
to set it high to avoid having to react to false positives," Hirschauer says. "But if
you set it too high, you're already pretty deep in the weeds by the time you
know about it. And your values change as your infrastructure changes, so it
becomes an administrative nightmare."
Wachovia, which was in the midst of a larger project to
improve systems monitoring, needed a better way to tell when performance was
just beginning to affect customers─not
just to know when the system was about to crash.
"We were basically in the business of availability
monitoring," Hirschauer says.
"We were really good at knowing when a file system would fill up or if a
server was up or down. What we didn't do a good job of traditionally was
monitoring of the component level and performance through the various
application tiers."
Symantec’s I3 deep-dive analysis tools helped troubleshoot
that particular system, but they couldn't get Wachovia's IT crew out of the
firefighting business.
The overall solution was a better-instrumented and better-integrated
set of systems-management tools from Symantec, Hewlett Packard and others.
The technology that pushed Wachovia's response far enough
upstream to head off trouble before it arrived was a tool that warns IT when a
system or application has stopped behaving normally─whether that means a sudden slowdown, acceleration or complete
lack of response.