Scaling Up or Out?

By David F. Carr  |  Posted 2008-06-16

Understanding patients’ needs for medical supplies, treatments and medications is critical for cost-conscious health care providers. If providers can predict prescription refill rates, they can better manage low-cost mail-order drug fulfillment, saving them and their patients money. Surveillance Data Inc. (SDI) was founded on that principle, mining the mountain of data produced by hospitals, health care facilities and pharmacies for trends, patterns and anomalies to help providers optimize their services.

Offering this type of monitoring and analysis requires tremendous computing power and storage. SDI took the approach that has worked so well for Google, Yahoo and Facebook: a distributed—or horizontal—infrastructure comprised of low-end commodity servers. While an efficient computational engine, scaling out servers horizontally eventually pushed the limits of SDI’s performance and storage capacity.

“We hit diminishing returns,” says SDI CIO Don Ragas. “As the volume of data was going up, the throughput of the servers was degrading.”

“The guy who had the job prior to me was basically mimicking the Google strategy,” Ragas continues. “He was doing scale-out on—well, I shouldn’t say cheap servers because they were pretty good—HP 385s and things like that. But he ran the whole data center on that approach, and when a server was maxed out, he would add another server.”

Solving SDI’s capacity problem required scaling up (vertically) the computing infrastructure with high-end, fault-tolerant servers that have multiple processors and boatloads of memory. All the network overhead of multiple database servers communicating to share data was eliminated when SDI consolidated the data warehouse—processing 60 TB of data—onto a single Unisys ES7000/one server.

“We did not have to add staff by going with the scale-up strategy,” Ragas says. “If we had continued with scale-out, we would have had to add more people.”

Many enterprises face the same question at some point about their server infrastructure: Scale up or scale out? While businesses of all sorts have discovered uses for grids, clusters and server farms populated by racks of commodity servers, vertical scaling remains alive and well in the enterprise, where many technology managers still tout the virtues of scaling up one or more high-powered servers.

At the high end, servers with eight or more processor sockets tend to come from IBM, Hewlett-Packard and Sun Microsystems and run mostly Unix. However, Unisys has built a business around offering servers with as many as 32 sockets to support the largest installations of Windows Datacenter Edition. HP’s Superdome servers, built around the Intel Itanium processor, can also run Windows or a combination of Windows, Linux and HP-UX Unix running in separate partitions.

The number of sockets installed may become less significant with the advent of multicore processors, which duplicate the core functions of two or more processors on a single chip. In other words, a 32-socket server fully loaded with dual-core processors would in some ways match the capacity of a 64-processor machine. However, two cores on a single chip may not have the clock speed, memory access and input/output capacity of two processors on separate chips. Still, multicore designs are boosting processing power overall, with quad-core processors now becoming more prevalent. Sun has introduced an eight-core processor, and the trend toward higher numbers of cores is likely to continue.

The number of processor sockets in a server is also only one measure of scalability. In addition, the highest of high-end servers sport greater memory capacity, as well as maintenance, reliability and expansion features you won’t find on the typical two-socket models. For example, some models allow processors, disks and other components to be swapped out while the server is running. These features pay off in reliability and ease of maintenance, in addition to scalability.

Which Way: Up or Out?

The scale-out vs. scale-up decision is a no-brainer. For example, scaling out front-end Web servers makes clear sense because dozens or hundreds of Web servers can handle more incoming network connections than a single high-powered server, and each Web server is performing largely independent tasks. “We would not recommend taking one of our 64-processor servers and filling it up with Web servers because there’s no real advantage there,” says Tom Atwood, a Sun systems group manager.

At the other end of the spectrum are large operational databases and data warehouses that perform best when the entire database is managed by a single server, sometimes with large indexes or even the entire database loaded into memory. For instance, telecommunications companies that track the billing transactions for millions of customers all making calls nearly simultaneously need higher-end servers rather than a distributed low-end infrastructure for efficiency sake.

“There you’re talking about monolithic applications that work really well on scale-up and would probably be a nightmare on scale-out,” says Chuck Walters, high-end server business manager at HP’s Business Critical Systems group.

Google has proven it is possible to handle very large, high-performance data management tasks on a scale-out architecture that spreads the work of compiling a new Web index across hundreds or thousands of servers. Such models, however, require distributed data management and processing. Google developed its own Google File System for distributed storage—with extensive data duplication and recovery features to compensate for the failure of individual servers—along with a suite of parallel programming utilities for spreading computations among multiple machines.

On the other hand, when tried to scale out the Microsoft SQL Server databases it uses to manage user logins and personalized page content, it ran into many challenges trying to figure out the best way to divide up the work and keep database instances in sync. MySpace had little choice but to meet these challenges head on. While it might have been able to postpone tackling some tough distributed computer challenges by employing a centralized database, ultimately no scale-up server would be large enough to support a single user table with hundreds of millions of rows. So the social networking firm had to scale out, regardless of the challenges.

In corporate enterprise settings, however, there often is room for debate. Should a database be scaled out across a cluster of servers running Oracle Real Application Clusters or loaded onto a single high-end server? Does it make sense to consolidate many applications currently running on separate servers onto a single server, using virtualization or another partitioning technology to keep them from interfering with each other? Or is there some big number-crunching analysis that can be run quickly and inexpensively across a grid of a hundred servers rather than on a single high-powered server?

The clearest case for choosing vertical scaling is probably the large transactional database, says Linton Ward, an IBM server systems architect. In that case, having everything on one server—with all processors sharing application state information across a single pool of memory—has the edge over a cluster of databases sharing information over a network connection. “This is the kind of thing where you want to be sure that, when someone orders the last T-shirt, the next guy doesn’t get to order it, too,” Ward explains. “Traditionally that’s been done with a monolithic, scale-up architecture.”

In addition to the technical advantages, software licensing sometimes favors running a database on one server rather than several. “As many as half of our Superdome customers run Oracle as the database,” HP’s Walters says, and one major reason (which Sun and IBM also mentions) is the premium Oracle charges for its clustered database solution. And there is a plus side. “If you want to scale up as much as you want, and then scale out, the database will support that,” Walters says.

Scale Diagonally

The scale-up vs. scale-out decision is not always an either/or situation, explains Sun’s Atwood. “There’s nothing that says when you do a cluster it has to be two processor servers.” Some Sun customers in finance and telecommunications are running clusters of Sun’s big 72-processor servers, he says, and many others are pursuing some variety of what Atwood refers to as “diagonal scaling,” clustering multiple midsize to large servers.

SDI’s Ragas concedes that following a scale-up strategy comes with its own challenges in the Windows environments, where it is not as established. For example, some of the software he runs on the ES7000/one, including the Oracle database and SAS analytics, isn’t tuned to run in a 64-bit Windows environment without customization and workarounds, he says.

On Unix, scaling up makes sense more often than not, says Mark Graham, an independent consultant who specializes in configuration and support of HP Superdome servers. He sees greater potential for virtualization on high-end machines than most companies are willing to embrace, simply because of the value application owners place on having their own separate servers. “It’s really kind of an emotional thing, which is frustrating to someone like me who sees the value in consolidation,” Graham says.

The company that puts one Superdome in place of a dozen other servers will reap savings in power consumption, data center floor space and administration, Graham says, so the total cost of ownership is likely to be better, despite the premium pricing of a high-end server.

Chris Reavis, director of enterprise infrastructure at wind and geothermal power specialist PPM Energy, believes both scale up and scale out have their place. As a former technical marketing manager at high-end server vendor Silicon Graphics, he has seen the issue from both the vendor and customer sides. In fact, one of his roles at SGI was consulting with customers on horizontal vs. vertical scaling strategies.

“Typically, for the back-end database and anything to do with heavy business logic, compute I/O or data I/O, I tend to go with vertical scaling,” Reavis says. “On the other hand, for front-end commodity computing, it tends to make sense to go with a simple rack-and-stack strategy for Web servers, FTP servers and e-mail servers.”

Reavis says he has seen the horizontal strategy work at operations as big as AOL and Yahoo, and it’s not particularly tough to manage for an organization with some competency in using automated deployment and systems management tools. He also has experienced success using Linux-based grid computing for data analysis.

Reavis has resisted the trend toward server virtualization and consolidation, considering it simpler to run applications on individual servers than to deal with the complexities of virtualization.

“I work with a lot of third-party code that may or may not play well with others, so I don’t want to mess around,” he says. “It’s just not worth the business risk. But for things like Oracle or our e-business applications, I go with scale up.”

For Reavis’ midsize enterprise at PPM Energy, four-socket Dell servers with quad-core Intel processors provide plenty of computing power. Even having worked for a vendor of high-end server hardware—and seen the value it brought to the financial services and scientific computing applications where it was employed—he concluded it would be overkill for PPM.

“If I were in banking or insurance, I would take a totally different approach,” Reavis says. “In some ways, we’re a little closer to a Yahoo, where we can afford to throw away a front-end box that dies.”