LAN Horror StoriesBy David Strom | Posted 2008-10-30 Email Print
How Real-World Numbers Make the Case for SSDs in the Data Center
Network management requires the right combination of skills, tools, and intuition to track down and solve problems.
We all have stories about our worst network-management nightmares. I’m going to share a few choice ones to show you that, sometimes, you should expect the unexpected.
The Case of the Bad Media Filter
Remember when fiber networks were first being implemented and required the use of media filters (converters that connected the fiber drops to a copper connector)? Well, Eric Kimminau, enterprise architect at a major automotive firm, remembers—and he has this tale to tell:
“As a fourth-level field support engineer, I was called onsite to a major petroleum company in Texas to investigate an NFS [Network File System] server performance-related issue. The interesting bit was that there were two groups of workstations: one that could communicate with the main file server in another building and one that couldn’t.
“We discovered that high-speed copper interconnects were used to connect floor-distribution switches to the building core, and media converters were used to go back to fiber for the campus WAN connections to other buildings. These specific media converters had a tendency to maintain ARP [Address Resolution Protocol] cache in the core switches.
“After a lot of troubleshooting with multiple vendors and installers, and spending about a hundred man-hours, we figured out that half the path of a copper-to-fiber media converter was bad. Simply disconnecting the media converter, taking down the link and forcing an ARP cache reset cleared the problem. All performance issues were instantly eliminated.
“The solution required a combination of load-generation tools, traffic and protocol analyzers, switch OS diagnostics, TCP stack-level kernel diagnostics, and multiple hardware vendors and support personnel. All this was needed to solve a problem caused by a $100 media converter!
“Of course, we then did a thorough quality check of the 40 or so media converters around the building and identified about 20 percent that either were failing to some degree or were contributing to significant degradation of network quality. A decision was made to implement high-quality fiber patch panels and to implement fiber runs from core to patch panels end to end for WAN links, eliminating the media converters from the environment.”
The Case of the Confident Intern
In some cases, the problem is people rather than technology. David O’Berry, the director of IT for the South Carolina Department of Probation, Parole and Pardon Services, tells his tale of woe about “operator error.”
“One of our interns was so top-notch that we had him working on our Cisco routers. One day, as he was making modifications to the routing table groups—doing something he had done hundreds of times before—the intern mistakenly typed the command to delete all the routing tables on the main router.
“I was in my office looking over the network when, all of a sudden, the entire state’s network disappeared. I heard the intern colorfully express his frustration. Luckily, we were able to restore all the routing tables with just a few minutes’ interruption in service. But from that point forward, anytime something blew up, we used the intern’s last name to describe the incident.”