Which Way: Up or Out?By David F. Carr | Posted 2008-06-16 Email Print
Google proved that a large distributed server infrastructure works as well—if not better—than consolidated high-end servers. But what works for Google may not work for everyone. The decision to scale up or out depends on your organization’s needs.
Which Way: Up or Out?
The scale-out vs. scale-up decision is a no-brainer. For example, scaling out front-end Web servers makes clear sense because dozens or hundreds of Web servers can handle more incoming network connections than a single high-powered server, and each Web server is performing largely independent tasks. “We would not recommend taking one of our 64-processor servers and filling it up with Web servers because there’s no real advantage there,” says Tom Atwood, a Sun systems group manager.
At the other end of the spectrum are large operational databases and data warehouses that perform best when the entire database is managed by a single server, sometimes with large indexes or even the entire database loaded into memory. For instance, telecommunications companies that track the billing transactions for millions of customers all making calls nearly simultaneously need higher-end servers rather than a distributed low-end infrastructure for efficiency sake.
“There you’re talking about monolithic applications that work really well on scale-up and would probably be a nightmare on scale-out,” says Chuck Walters, high-end server business manager at HP’s Business Critical Systems group.
Google has proven it is possible to handle very large, high-performance data management tasks on a scale-out architecture that spreads the work of compiling a new Web index across hundreds or thousands of servers. Such models, however, require distributed data management and processing. Google developed its own Google File System for distributed storage—with extensive data duplication and recovery features to compensate for the failure of individual servers—along with a suite of parallel programming utilities for spreading computations among multiple machines.
On the other hand, when MySpace.com tried to scale out the Microsoft SQL Server databases it uses to manage user logins and personalized page content, it ran into many challenges trying to figure out the best way to divide up the work and keep database instances in sync. MySpace had little choice but to meet these challenges head on. While it might have been able to postpone tackling some tough distributed computer challenges by employing a centralized database, ultimately no scale-up server would be large enough to support a single user table with hundreds of millions of rows. So the social networking firm had to scale out, regardless of the challenges.
In corporate enterprise settings, however, there often is room for debate. Should a database be scaled out across a cluster of servers running Oracle Real Application Clusters or loaded onto a single high-end server? Does it make sense to consolidate many applications currently running on separate servers onto a single server, using virtualization or another partitioning technology to keep them from interfering with each other? Or is there some big number-crunching analysis that can be run quickly and inexpensively across a grid of a hundred servers rather than on a single high-powered server?
The clearest case for choosing vertical scaling is probably the large transactional database, says Linton Ward, an IBM server systems architect. In that case, having everything on one server—with all processors sharing application state information across a single pool of memory—has the edge over a cluster of databases sharing information over a network connection. “This is the kind of thing where you want to be sure that, when someone orders the last T-shirt, the next guy doesn’t get to order it, too,” Ward explains. “Traditionally that’s been done with a monolithic, scale-up architecture.”
In addition to the technical advantages, software licensing sometimes favors running a database on one server rather than several. “As many as half of our Superdome customers run Oracle as the database,” HP’s Walters says, and one major reason (which Sun and IBM also mentions) is the premium Oracle charges for its clustered database solution. And there is a plus side. “If you want to scale up as much as you want, and then scale out, the database will support that,” Walters says.