Behavioral MonitoringBy Kevin Fogarty | Posted 2008-07-08 Email Print
Wachovia's battery of systems-management and monitoring tools reacted too slowly. Here's how they dealt with the situation.
The tool, Netuitive SI from Netuitive Inc. in Reston, Va., is an agentless monitoring system that sits on the network watching traffic from specific hardware or software.
Behavioral monitoring and self-learning in performance-management tools give IT managers much greater predictability and control over application performance, according to a Yankee Group report "Optimizing Virtual Environments Requires Self-Learning Performance Management" updated in January 2008.
The ability to adapt quickly to changes in performance and to monitor virtual machines as well as physical servers and applications helps IT establish benchmarks that make predictive analytics much more effective, he says.
Netuitive SI gave solid data on systems performance right away, and the technology’s greatest value for Wachovia came after about two weeks of watching and benchmarking systems under various conditions, Hirschauer explains.
"Within the first week, you're getting decent predictive analytics," he says. "After two weeks, we saw the baseline thresholds and algorithms get very, very accurate for us. When we add systems to Netuitive, we look at the data, but take it with a grain of salt for the first two weeks. After that, we take it very seriously when it predicts we'll have a problem with something."
The product proved itself during the feasibility testing by calling a warning on a production database that hadn't caused any performance alerts from Wachovia's other tools for more than 30 days.
Netuitive SI issued a critical alarm after watching CPU utilization go from 10% to close to 50%, with a coincidental rise in contact switching, network activity and other metrics.
"The SQL calls were really slowing down, but none of the other tools said anything; at 50% utilization, the level wasn't high enough to trigger an alarm," Hirschauer says. "It turned out that one of our vendors had dumped a large amount of data into the database without going through the change-control process, and had caused a significant change in performance.
"Customers didn't know what happened. Some saw a slowdown, but it was not that great yet," Hirschauer says. "If this had happened without Netuitive, we wouldn't have known it had happened; if it happened again, we would have had some serious problems with that system."
Dynamic thresholding helps save IT managers time and effort in responding to false alarms and inaccurate performance estimates, according to a Meta Group report by analyst Bob Wallace, who cited research showing that false positives make up as much as 90% of total alerts.
That volume of false alarms not only wastes time, it erodes the credibility of any performance alarm; this "cry-wolf effect" keeps IT managers putting out fires rather than identifying underlying performance problems, Wallace says.
In addition to Netuitive, Wachovia uses Symantec systems management tools, and CoreFirst software from Optier to track business transactions as they flow through the IT infrastructure and collectperformance statistics on server and applications. [[[Nope. That's right. Optier does the transaction monitoring]]]
The primary management console is BMC Software's Patrol, which integrates and displays performance data from other tools, including Netuitive. Netuitive is doing some custom integration for Wachovia, but is also working on more generic integration code that will enable BMC customers to use it without any external assistance. (BMC and Netuitive have been bundling and integrating their tools since 2003.)
The combination is a solid, easily justified toolset, Hirschauer says.
"If you have a performance impact on an important system, you're doing a disservice to the business if you're not keeping up with it and keeping it from happening," Hirschauer says. "Especially in investment banking, customers will just pick up and move elsewhere."