Implementing Reliable Server ManagementBy Jed Krisch | Posted 2012-07-30 Email Print
Carilion Clinic’s previous attempts at server management were never 100 percent effective, so it searched for a system that would be more reliable, more effective, and easier to implement and manage.
Carilion Clinic, a nonprofit health care organization, owns and operates hospitals, outpatient specialty centers and advanced primary care practices in the Virginia area. As the technical manager for Carilion, I am responsible for monitoring and managing 983 servers (including Hewlett-Packard ProLiant BL460c, DL380 and DL360 machines) and more than 10,000 endpoint devices in about 150 locations throughout Virginia.
In our industry, uptime and availability are critical because downed servers and systems can put lives at risk, in addition to costing money. It is our job and purpose to make sure that all systems are up, running and healthy. Unfortunately, our previous attempts at server management were never 100 percent effective; we were not always the first to know that a server was down. We were always looking for a better solution that was more reliable, more effective, and easier to implement and manage.
In total, I have seven employees on my staff, and they are accountable for managing the servers and staying on top of hundreds of applications. At a minimum, we want to know—before anyone else—about a server that’s down, a core service failure or a rapidly filling disk. That helps us stay on top of problems proactively and minimizes service disruptions.
In the past, we used server management products that were cumbersome, costly and did not provide 100 percent coverage of our server environment. A task such as alerting us when a server was down or a disk was full was buried or lost because the product wasn’t communicating with all of the servers, its agents were down, or its notifications had failed. It was simply too much work for our small team. Managing the management system was taking more time than managing the servers!
We started working with the SolarWinds Server & Application Monitor (SAM) after seeing one screen in our existing SolarWinds Network Performance Monitor (NPM) deployment. Even though it is geared toward network architecture and components, this NPM has the ability to provide somewhat basic monitoring and management of Windows servers—a subset of what is available in SAM. After seeing what NPM could do in this role, I knew the more server- and application-centric SAM would provide even greater capability.
Unlike many of the enterprise offerings we had worked with previously, SAM was not bloated with information and had a clean, easy-to-use interface. Within a week-and-a-half of playing with the free trial, we were able to accomplish things with SAM that we had been trying to do with our previous system for three years.
At first, we monitored Windows servers and services to see whether they were up or down. This process took days to set up in the previous system, but I gave several of my employees a keyboard and described some first steps, and they figured out how to use SAM in minutes.
SAM also includes built-in, community-based application templates that give us most of the information we need, so we can quickly find and communicate with all our servers. And while we have specific applications related to the health care industry, we found it easy and intuitive to build custom application monitors for our unique medical applications, which we can use throughout the organization.
SAM makes things a lot easier and quicker. For example, creating an up-to-date server inventory for my annual license agreements used to be a two- to three-day process but now takes only seconds. To accomplish this and other tasks, it is important that all of our active nodes are present in our management tools—which was always a challenge in the past. Since we’ve integrated SAM with our IT asset-management system, when we purchase a server from HP, the node is added and monitored in SAM as soon as it is placed in service.
By switching to SAM, we are more proactive in providing detailed, accurate and historic data on the servers and their activity than we have ever been in the past. Finally, we can monitor 100 percent of our server environment to ensure the health of the critical files and services the Carilion Clinic needs to serve our clients.
Jed Krisch is the technical manager at Carilion Clinic.