Monitoring Your Infrastructure's Performance
By Tony Kontzer
Sometime in late June, lightning struck one of Advance Auto Parts' two main data centers. It fried the cards in the primary and backup routers that served as the facility's link to the outside world, and interrupted the processing of replenishment orders needed to restock the company's 4,000-plus auto parts stores.
While the incident could have proved crippling, Advance was able to have the data center back up within an hour, and the replenishment orders were processed as scheduled. The quick recovery can be attributed to the company's attention to a rarely talked-about, yet critical, technology: infrastructure monitoring.
"Our monitoring system alerted all the right people, woke up the right teams, and was able to sort out what broke and why," recalls Brent Paine, manager of network infrastructure for the $6.2 billion-a-year, Roanoke, Va.-based retailer. "Monitoring is a fundamental process and task that has to be done on any network larger than a home network. Without it, you're trying to look at blades of grass in a one-acre yard and spot which one is the weed."
It should come as no surprise that infrastructure monitoring is a big deal for the company, given the scope of Advance's operation: The two data centers support 11 huge distribution centers (each of which covers more than 1 million square feet and has its own small data center), numerous sub-distribution centers and an exhaustive network of stores.
Advance relies on multiple monitoring tools to keep tabs on its varied environment. Software from SolarWinds helps the company keep an eye on its network infrastructure performance; Riverbed keeps it apprised of data traffic flows; OpNet (which Riverbed acquired late last year) tells Advance what its data traffic actually consists of; and a variety of Microsoft tools monitor the firm's virtual infrastructure components.
"We are pretty heavily invested in end-to-end monitoring," says Paine. The company's monitoring tools keep Paine and his team abreast of everything from bandwidth congestion and the status of switches and routers to network latency and application performance—all the way down to the store level.
Though monitoring grows more complicated as Advance adds an average of 200 stores a year, Paine says that expanding the monitoring footprint to accommodate growth shouldn't be painful.
"It's an evolutionary process; it has to grow with the demands of the business," he says. "It shouldn't be an intrusive, burdensome task."
Try telling that to the many companies that have struggled to decide which monitoring tools to use. Colin Fletcher, a research director for IT consultancy Gartner, says such companies have found themselves torn between tools designed to perform very specific functions—such as server monitoring—and those that handle broader monitoring of infrastructure components.
Now, says Fletcher, vendors with broader tools are trying to support more narrow needs, while those focused on specific technologies are trying to broaden their capabilities.
"It's leaving most organizations caught in the middle in trying to figure out the best paths for themselves," he says, noting that because the broader tools don't handle the specialized tasks as well, many companies opt for a multivendor environment similar to what Advance has put in place.
Developing Its Own Technologies
Then there are those companies that have decided that the market can't provide anything they can't do better themselves. That's the situation of Visa, which has built out its infrastructure over decades, often developing its own technologies for managing and monitoring that infrastructure in-house.
For example, the company uses its own proprietary system, called Vital Signs, to monitor its high-performance processing environments to assess performance. With 10,000 transactions typically running through its network—VisaNet—simultaneously, ensuring peak performance is tantamount to protecting itself, credit card issues and merchants from unacceptable risk.
"People get annoyed if it's not working correctly," says CTO Matt Quinlan of VisaNet.
What really makes Vital Signs stand out is the fact that it doesn't just monitor Visa's environment; it also monitors the environments of its partners. Quinlan explains: "It looks out to external participants to detect problems around the edge of our network and advise our partners when they are potentially experiencing issues."
Doing so allows Visa to protect not just its IT footprint, but rather its entire ecosystem. If a credit card issuer is experiencing a serious performance issue, Visa can have its network perform authorizations on behalf of the bank rather than having to cancel transactions, which is obviously bad for business. Some companies, says Quinlan, have Visa handle approval of small transactions that occur during peaks in their network traffic, rather than building their own infrastructures to accommodate those traffic spikes.
Naturally, as this occurs, Vital Signs is assessing the risk presented by all of these scenarios. "It's instrumented all the way down, not just for performance, but from a security perspective," says Quinlan.
Looking to the Future
As companies like Visa and Advance have learned, how well a company monitors its infrastructure will go a long way toward determining how innovative and forward-thinking it can be. Among the benefits Gartner's Fletcher says infrastructure monitoring delivers are faster resolution of issues, improved quality of service, cost avoidance and, perhaps most important, huge savings of staff time that can then be devoted to revenue-generating activities.
"It doesn't get any more core than infrastructure monitoring," Fletcher says. "If you don't automate this core function, you don't have time to ever get to rethinking your architecture or rolling out a new application. More often than not, how well you do this determines how many resources you have that you can dedicate to new things."
Companies with more mature monitoring setups can take things a step further by measuring the impact incidents have on productivity. For instance, says Fletcher, a small percentage of companies take a weighted percentage of their labor costs, compare it against the average length of their failures, and determine how many employee hours they're losing during outages so they can work on minimizing that impact.
And there's more: Fletcher says that without effective infrastructure monitoring, companies can't get the full value of their investments in more advanced tools, such as application monitoring. "You need something that is at least giving you an idea of what exactly is having an issue and as much info as possible about that issue," he says.
Advance Auto Parts' Paine is getting a clearer picture these days because of technologies like OpNet's. He says it wasn't long ago that he was only concerned with monitoring the network's behavior, not its actual traffic. But that's not enough anymore.
"We need to know what's going on with these networks," he says. "One poorly behaved application can wreak havoc on everything on the network."
Still, no matter how much insight today's monitoring tools provide to Paine, he knows he needs to stay proactive—ready to constantly up the ante with new monitoring technologies as Advance's network performance becomes more important to the success of the business.
"It's a moving target," he says. "There's no saying, okay, we've done it. Once a company does that, it ends up obsolete in a year and a half."