How Google Works: Google's SecretsBy David F. Carr | Posted 2006-07-06 Email Print
For all the razzle-dazzle surrounding Google, the company must still work through common business problems such as reporting revenue and tracking projects. But it sometimes addresses those needs in unconventional—yet highly efficient—ways. Other
For all the papers it has published, Google refuses to answer many questions. "We generally don't talk about our strategy ... because it's strategic," Page told Time magazine when interviewed for a Feb. 20 cover story.
One of the technologies Google has made public, PageRank, is Page's approach to ranking pages based on the interlinked structure of the Web. It has become one of the most famous elements of Google's technology because he published a paper on it, including the mathematical formula. Stanford holds the patent, but through 2011 Google has an exclusive license to PageRank.
Still, Yahoo's research arm was able to treat PageRank as fair game for a couple of its own papers about how PageRank might be improved upon; for example, with its own TrustRank variation based on the idea that trustworthy sites tend to link to other trustworthy sites. Even if competitors can't use PageRank per se, the information Page published while still at Stanford gave competitors a starting point to create something similar.
"PageRank is well known because Larry published it—well, they'll never do that again," observes Simson Garfinkel, a postdoctoral fellow at Harvard's Center for Research on Computation and Society, and an authority on information security and Internet privacy. Today, Google seems to have created a very effective "cult of secrecy," he says. "People I know go to Google, and I never hear from them again."
Because Google, which now employs more than 6,800, is hiring so many talented computer scientists from academia—according to The Mercury News in San Jose, it hires on average 12 new employees a day and recently listed 1,800 open jobs—it must offer them some freedom to publish, Garfinkel says. He has studied the GFS paper and finds it "really interesting because of what it doesn't say and what it glosses over. At one point, they say it's important to have each file replicated on more than three computers, but they don't say how many more. At the time, maybe the data was on 50 computers. Or maybe it was three computers in each cluster." And although the GFS may be one important part of the architecture, "there are probably seven layers [of undisclosed technology] between the GFS system and what users are seeing."
One of Google's biggest secrets is exactly how many servers it has deployed. Officially, Google says the last confirmed statistic for the number of servers it operates was 10,000. In his 2005 book The Google Legacy, Infonortics analyst Stephen E. Arnold puts the consensus number at 150,000 to 170,000. He also says Google recently shifted from using about a dozen data centers with 10,000 or more servers to some 60 data centers, each with fewer machines. A New York Times report from June put the best guess at 450,000 servers for Google, as opposed to 200,000 for Microsoft.
The exact number of servers in Google's arsenal is "irrelevant," Garfinkel says. "Anybody can buy a lot of servers. The real point is that they have developed software and management techniques for managing large numbers of commodity systems, as opposed to the fundamentally different route Microsoft and Yahoo went."
Other Web operations like Yahoo that launched earlier built their infrastructure around smaller numbers of relatively high-end servers, according to Garfinkel. In addition to saving money, Google's approach is better because "machines fail, and they fail whether you buy expensive machines or cheap machines."
Of particular interest to CIOs is one widely cited estimate that Google enjoys a 3-to-1 price-performance advantage over its competitors—that is, that its competitors spend $3 for every $1 Google spends to deliver a comparable amount of computing power. This comes from a paper Google engineers published in 2003, comparing the cost of an eight-processor server with that of a rack of 176 two-processor servers that delivers 22 times more processor power and three times as much memory for a third of the cost.
In this example, the eight-processor server was supposed to represent the traditional approach to delivering high performance, compared with Google's relatively cheap servers, which at the time used twin Intel Xeon processors.
But although Google executives often claim to enjoy a price-performance advantage over their competitors, the company doesn't necessarily claim that it's a 3-to-1 difference. The numbers in the 2003 paper were based on a hypothetical comparison, not actual benchmarks versus competitors, according to Google. Microsoft and Yahoo have also had a few years to react with their own cost-cutting moves.
"Google is very proud of the cost of its infrastructure, but we've also driven to a very low cost," says Lars Rabbe, who as chief information officer at Yahoo is responsible for all data center operations.
Microsoft provided a written statement disputing the idea that Google enjoys a large price-performance advantage. Microsoft uses clusters of inexpensive computers where it makes sense, but it also uses high-end multi-processor systems where that provides an advantage, according to the statement. And Windows does a fine job of supporting both, according to Microsoft.
Certainly, Yahoo's systems design is different from Google's, Rabbe says. "Google grew up doing one thing only" and wound up with a search-driven architecture "that is very uniform, with a lot of parallelism." Yahoo has increased its focus on the use of parallel computing with smaller servers, he says, but is likely to continue to have a more heterogeneous server infrastructure. For example, Yahoo launched the company on the FreeBSD version of Unix but has mixed in Linux and Windows servers to support specific applications. While the Yahoo and Google companies compete as search engines, he says, "We also do 150 other things, and some of those require a very different type of environment."
Still, Gartner analyst Ray Valdes believes Google retains an advantage in price-performance, as well as in overall computing power. "I buy that. Even though I'm usually pretty cynical and skeptical, as far as I know, nobody has gone to that extent and pushed the envelope in the way they have," he says. "Google is still doing stuff that others are not doing."
The advantage will erode over time, and Google will eventually run up against the limits of how much homegrown technology it can manage, Valdes predicts: "The maintenance of their own system software will become more of a drag to them."
But Google doesn't buy this traditional argument for why enterprises should stick to application-level code and leave more fundamental technologies like file systems, operating systems and databases to specialized vendors. Google's leaders clearly believe they are running a systems engineering company to compete with the best of them.
Also in this Feature: