How Google Works

With his unruly hair dipping across his forehead, Douglas Merrill walks up to the lectern set up in a ballroom of the Arizona Biltmore Resort and Spa, looking like a slightly rumpled university professor about to start a lecture. In fact, he is here on this April morning to talk about his work as director of internal technology for Google to a crowd of chief information officers gathered at a breakfast sponsored by local recruiting firm Phoenix Staffing.

Google, the secretive, extraordinarily successful $6.1 billion global search engine company, is one of the most recognized brands in the world. Yet it selectively discusses its innovative information management infrastructure—which is based on one of the largest distributed computing/grid systems in the world.

Merrill is about to give his audience a rare glimpse into the future according to Google, and explain the workings of the company and the computer systems behind it.

For all the razzle-dazzle surrounding Google—everything from the press it gets for its bring-your-dog-to-work casual workplace, to its stock price, market share, dizzying array of beta product launches and its death-match competition with Microsoft—it must also solve more basic issues like billing, collection, reporting revenue, tracking projects, hiring contractors, recruiting and evaluating employees, and managing videoconferencing systems—in other words, common business problems.

But this does not mean that Google solves these problems in a conventional way, as Merrill is about to explain.

“We’re about not ever accepting that the way something has been done in the past is necessarily the best way to do it today,” he says.

Among other things, that means that Google often doesn’t deploy standard business applications on standard hardware. Instead, it may use the same text parsing technology that drives its search engine to extract application input from an e-mail, rather than a conventional user interface based on data entry forms. Instead of deploying an application to a conventional server, Merrill may deploy it to a proprietary server-clustering infrastructure that runs across its worldwide data centers.

Google runs on hundreds of thousands of servers—by one estimate, in excess of 450,000—racked up in thousands of clusters in dozens of data centers around the world. It has data centers in Dublin, Ireland; in Virginia; and in California, where it just acquired the million-square-foot headquarters it had been leasing. It recently opened a new center in Atlanta, and is currently building two football-field-sized centers in The Dalles, Ore.

By having its servers and data centers distributed geographically, Google delivers faster performance to its worldwide audience, because the speed of the connection between any two computers on the Internet is partly a factor of the speed of light, as well as delays caused by network switches and routers. And although search is still Google’s big moneymaker, those servers are also running a fast-expanding family of other applications like Gmail, Blogger, and now even Web-based word processors and spreadsheets.

That’s why there is so much speculation about Google the Microsoft-killer, the latest firm nominated to drive everything to the Web and make the Windows desktop irrelevant. Whether or not you believe that, it’s certainly true that Google and Microsoft are banging heads. Microsoft expects to make about a $1.5 billion capital investment in server and data structure infrastructure this year. Google is likely to spend at least as much to maintain its lead, following a $838 million investment in 2005.

And at Google, large-scale systems technology is all-important. In 2005, it indexed 8 billion Web pages. Meanwhile, its market share continues to soar. According to a recent ComScore Networks qSearch survey, Google’s market share for search among U.S. Internet users reached 43% in April, compared with 28% for Yahoo and 12.9% for The Microsoft Network (MSN).

And Google’s market share is growing; a year ago, it was 36.5%. The same survey indicates that Americans conducted 6.6 billion searches online in April, up 4% from the previous month. Google sites led the pack with 2.9 billion search queries performed, followed by Yahoo sites (1.9 billion) and MSN-Microsoft (858 million).

This growth is driven by an abundance of scalable technology. As Google noted in its most recent annual report filing with the SEC: “Our business relies on our software and hardware infrastructure, which provides substantial computing resources at low cost. We currently use a combination of off-the-shelf and custom software running on clusters of commodity computers. Our considerable investment in developing this infrastructure has produced several key benefits. It simplifies the storage and processing of large amounts of data, eases the deployment and operation of large-scale global products and services, and automates much of the administration of large-scale clusters of computers.”

Google buys, rather than leases, computer equipment for maximum control over its infrastructure. Google chief executive officer Eric Schmidt defended that strategy in a May 31 call with financial analysts. “We believe we get tremendous competitive advantage by essentially building our own infrastructures,” he said.

Google does more than simply buy lots of PC-class servers and stuff them in racks, Schmidt said: “We’re really building what we think of internally as supercomputers.”

Because Google operates at such an extreme scale, it’s a system worth studying, particularly if your organization is pursuing or evaluating the grid computing strategy, in which high-end computing tasks are performed by many low-cost computers working in tandem.

Despite boasting about this infrastructure, Google turned down requests for interviews with its designers, as well as for a follow-up interview with Merrill. Merrill did answer questions during his presentation in Phoenix, however, and the division of the company that sells the Google Search Appliance helped fill in a few blanks.

In general, Google has a split personality when it comes to questions about its back-end systems. To the media, its answer is, “Sorry, we don’t talk about our infrastructure.” Yet, Google engineers crack the door open wider when addressing computer science audiences, such as rooms full of graduate students whom it is interested in recruiting. As a result, sources for this story included technical presentations available from the University of Washington Web site, as well as other technical conference presentations, and papers published by Google’s research arm, Google Labs.

Also in this Feature: