LAN Horror Stories

By David Strom  |  Posted 2008-10-30

We all have stories about our worst network-management nightmares. I’m going to share a few choice ones to show you that, sometimes, you should expect the unexpected.

The Case of the Bad Media Filter

Remember when fiber networks were first being implemented and required the use of media filters (converters that connected the fiber drops to a copper connector)? Well, Eric Kimminau, enterprise architect at a major automotive firm, remembers—and he has this tale to tell:

“As a fourth-level field support engineer, I was called onsite to a major petroleum company in Texas to investigate an NFS [Network File System] server performance-related issue. The interesting bit was that there were two groups of workstations: one that could communicate with the main file server in another building and one that couldn’t.

“We discovered that high-speed copper interconnects were used to connect floor-distribution switches to the building core, and media converters were used to go back to fiber for the campus WAN connections to other buildings. These specific media converters had a tendency to maintain ARP [Address Resolution Protocol] cache in the core switches.

“After a lot of troubleshooting with multiple vendors and installers, and spending about a hundred man-hours, we figured out that half the path of a copper-to-fiber media converter was bad. Simply disconnecting the media converter, taking down the link and forcing an ARP cache reset cleared the problem. All performance issues were instantly eliminated.

“The solution required a combination of load-generation tools, traffic and protocol analyzers, switch OS diagnostics, TCP stack-level kernel diagnostics, and multiple hardware vendors and support personnel. All this was needed to solve a problem caused by a $100 media converter!

“Of course, we then did a thorough quality check of the 40 or so media converters around the building and identified about 20 percent that either were failing to some degree or were contributing to significant degradation of network quality. A decision was made to implement high-quality fiber patch panels and to implement fiber runs from core to patch panels end to end for WAN links, eliminating the media converters from the environment.”

The Case of the Confident Intern

In some cases, the problem is people rather than technology. David O’Berry, the director of IT for the South Carolina Department of Probation, Parole and Pardon Services, tells his tale of woe about “operator error.”

“One of our interns was so top-notch that we had him working on our Cisco routers. One day, as he was making modifications to the routing table groups—doing something he had done hundreds of times before—the intern mistakenly typed the command to delete all the routing tables on the main router.

“I was in my office looking over the network when, all of a sudden, the entire state’s network disappeared. I heard the intern colorfully express his frustration. Luckily, we were able to restore all the routing tables with just a few minutes’ interruption in service. But from that point forward, anytime something blew up, we used the intern’s last name to describe the incident.”

The Case of the Blocked Internet

Sometimes you don’t realize you have a horror story until you look back and see how bad the situation was. The City of Davenport, Iowa, had a network infrastructure that was a mess—largely because of a poorly evolved Internet filtering solution. The city needed to unblock access for particular users at certain times of the day and to support Citrix terminal users without a lot of configuration.

Here’s what happened, according to John Sparks, the city’s information systems supervisor: “We were using a Web blocking tool and had had configuration problems with it for years. Eventually, we were using three servers: one with the logging software, another one with the blocking feature and the third was a proxy server. It was a bear to explain how to use the tool, I didn’t like its reporting and the whole setup was too confusing to explain to our IT staff. On top of that, we were paying $5,000 to $6,000 per year.”

This setup wasn’t working, especially when it came time to manage the city’s Citrix logins and be able to understand how various employees used Internet bandwidth. In addition, Sparks’ team had to make changes to the filtering policies for employees whose jobs required them for law enforcement and other professional reasons in order to check adult sites or monitor eBay auctions.

“We now pay about half the cost with a Cymphonix solution that we got about a year ago,” Sparks says. “The vendor installed a client on the Citrix servers and redirected the logins so we didn’t need a separate proxy server. Plus, now I have Active Directory groups that are segregated by access policy, so I can just change the group membership, and someone instantly has the appropriate access to do his or her job. It is much easier to maintain and much more flexible.” Sparks also likes the reporting features of Cymphonix.

The Case of the Intermittently Slow Network Response

Ethernet port duplex mismatches have caused a lot of grief over the years, as the following story illustrates. Health-care conglomerate Texas Health Resources (THR) in Arlington, Texas, had a problem with users complaining about slow network connections.

“In theory, most network cards automatically sense the speed and duplex configuration, but that is mostly on PCs,” says Greg Essler, manager of THR’s Network Engineering Infrastructure Group. “We have 36,000 devices on the THR network, and about 60 percent of them are clinical patient-care devices, such as IV pumps, medical scanners, monitors and other equipment that doesn’t run any typical PC-based operating system.

“With that many devices spread across 300 miles over our WAN, which connects 13 hospitals and 140 clinics and offices, it is a lot for just 10 people on my staff to cover. We have to maintain more than 3,100 switches, routers and wireless controllers, and that adds up to a lot of ports. Until about a year ago, we monitored the system using manual methods and a lot of legwork.

“Then we got Netcordia’s NetMRI and were able to more efficiently manage our network infrastructure. We now are able to build configuration templates that can be pushed out automatically rather than touching every switch. Plus, we can easily find the one or two missed configurations that always pop up and troubleshoot the little things that impact applications or device performance. It has created a lot of efficiency in our department.”

These horror stories make it clear that network management requires the right combination of skills, tools, and intuition to track down and solve problems.