How Google Works: The Start of the StoryBy David F. Carr | Posted 2006-07-06 Email Print
For all the razzle-dazzle surrounding Google, the company must still work through common business problems such as reporting revenue and tracking projects. But it sometimes addresses those needs in unconventional—yet highly efficient—ways. Other
The Start of the Story
Google started with a research project into the structure of the Web led by two Stanford University Ph.D. candidates, Larry Page and Sergey Brin. After initially offering to sell the search engine they had created to an established firm, such as Yahoo, but failing to find a buyer, they established Google in 1998 to commercialize the technology. For the first few years of the company's existence, the co-founders were determined to avoid making money through advertising, which they thought would corrupt the integrity of search results. But when their initial technology licensing business model fell flat, they compromised. While keeping the Google home page ad-free, they began inserting text ads next to their search results, with ad selection driven by the search keywords. The ads were clearly separated from the search results, but highly effective because of their relevance. At pennies per click, the ad revenue pouring into Google began mounting quickly.
When Google went public in 2004, analysts, competitors and investors were stunned to learn how much money the company with the ad-free home page was raking in—$1.4 billion in the first half of 2004 alone. Last year, revenue topped $6 billion.
Google and its information-technology infrastructure had humble beginnings, as Merrill illustrates early in his talk with a slide-show photo of Google.stanford.edu, the academic research project version of the search engine from about 1997, before the formation of Google Inc., when the server infrastructure consisted of a jumble of PCs scavenged from around campus.
"Would any of you be really proud to have this in your data center?" Merrill asks, pointing to the disorderly stack of servers connected by a tangle of cables.
"But this is the start of the story," he adds, part of an approach that says "don't necessarily do it the way everyone else did. Just find some way of doing it cheap and effectively—so we can learn."
The basic tasks that Google had to perform in 1997 are the same it must perform today. First, it scours the Web for content, "crawling" from one page to another, following links. Copies of pages it finds must be stored in a document repository, along with their Internet addresses, and further analyzed to produce indexes of words and links occurring in each document. When someone types keywords into Google, the search engine compares them against the index to determine the best matches and displays links to them, along with relevant excerpts from the cached Web documents. To make all this work, Google had to store and analyze a sizable fraction of all the content on the Web, which posed both technical and economic challenges.
By 1999, the Google.com search engine was running in professionally managed Internet data centers run by companies like Exodus. But the equipment Google was parking there was, if anything, more unconventional, based on hand-built racks with corkboard trays. The hardware design was led by Page, a natural engineer who once built a working ink-jet printer out of Legos. His team assembled racks of bare motherboards, mounted four to a shelf on corkboard, with cheap no-name hard drives purchased at Fry's Electronics. These were packed close together (like "blade servers before there were blade servers," Merrill says). The individual servers on these racks exchanged and replicated data over a snarl of Ethernet cables plugged into Hewlett-Packard network switches. The first Google.com production system ran on about 30 of these racks. You can see one for yourself at the Computer History Museum, just a few blocks away from Google's Mountain View headquarters.
Part of the inspiration behind this tightly packed configuration was that, at the time, data centers were charging by the square foot, and Page wanted to fit the maximum computer power into the smallest amount of space. Frills like server enclosures around the circuit boards would have just gotten in the way.
This picture makes the data center manager in Merrill shudder. "Like, the cable management is really terrible," he says. Why aren't cables carefully color-coded and tied off? Why is the server numbering scheme so incoherent?
The computer components going into those racks were also purchased for rock-bottom cost, rather than reliability. Hard drives were of a "poorer quality than you would put into your kid's computer at home," Merrill says, along with possibly defective memory boards sold at a fire-sale discount. But that was just part of the strategy of getting a lot of computing power without spending a lot of money. Page, Brin and the other Google developers started with the assumption that components would fail regularly and designed their search engine software to work around that.
Google's co-founders knew the Web was growing rapidly, so they would have to work very hard and very smart to make sure their index of Web pages could keep up. They pinned their hopes on falling prices for processors, memory chips and storage, which they believed would continue to fall even faster than the Web was growing. Back at the very beginning, they were trying to build a search engine on scavenged university resources. Even a little later, when they were launching their company on the strength of a $100,000 investment from Sun Microsystems co-founder Andy Bechtolsheim, they had to make their money stretch as far as possible. So, they built their system from the cheapest parts money would buy.
At the time, many dot-com companies, flush with cash, were buying high-end Sun servers with features like RAID hard drives. RAID, for redundant arrays of independent disks, boosts the reliability of a storage device through internal redundancy and automatic error correction. Google decided to do the same thing by different means—it would make entire computers redundant with each other, making many frugally constructed computers work in parallel to deliver high performance at a low cost.
By 1999, when Google received $25 million in venture funding, the frugality that had driven the early systems design wasn't as much of a necessity. But by then, it had become a habit.
Later Google data centers tidied up the cabling, and corkboard (which turned out to pose a fire hazard) vanished from the server racks. Google also discovered that there were downsides to its approach, such as the intense cooling and power requirements of densely packed servers.
In a 2003 paper, Google noted that power requirements of a densely packed server rack could range from 400 to 700 watts per square foot, yet most commercial data centers could support no more than 150 watts per square foot. In response, Google was investigating more power-efficient hardware, and reportedly switched from Intel to AMD processors for this reason. Google has not confirmed the choice of AMD, which was reported earlier this year by Morgan Stanley analyst Mark Edelstone.
Although Google's infrastructure has gone through many changes since the company's launch, the basic clustered server rack design continues to this day. "With each iteration, we got better at doing this, and doing it our way," Merrill says.
Also in this Feature: