How Google Works: Exotic but Not Unique

By David F. Carr Print this article Print

For all the razzle-dazzle surrounding Google, the company must still work through common business problems such as reporting revenue and tracking projects. But it sometimes addresses those needs in unconventional—yet highly efficient—ways. Other

Exotic but Not Unique; Would it Work for You?

Google's systems seem to work well for Google. But if you could run your own systems on the Google File System, would you want to? Or is this an architecture only a search engine could love?

Distributed file systems have been around since the 1980s, when the Sun Microsystems Network File System and the Andrew File System, developed at Carnegie Mellon University, first appeared. Software engineer and blogger Jeff Darcy says the system has a lot in common with the HighRoad system he worked on at EMC. However, he notes that Google's decision to "relax" conventional requirements for file system data consistency in the pursuit of performance makes its system "unsuitable for a wide variety of potential applications." And because it doesn't support operating system integration, it's really more of a storage utility than a file system per se, he says. To a large extent, Google's design strikes him as more of a synthesis of many prior efforts in distributed storage.

Despite those caveats, Darcy says he also sees many aspects of the GFS as "cool and useful," and gives Google credit for "bringing things that might have been done mostly as research projects and turning them into a system stable enough and complete enough to be used in commercial infrastructure."

Google software engineers considered and rejected modifying an existing distributed file system because they felt they had different design priorities, revolving around redundant storage of massive amounts of data on cheap and relatively unreliable computers.

Despite having published details on technologies like the Google File System, Google has not released the software as open source and shows little interest in selling it. The only way it is available to another enterprise is in embedded form—if you buy a high-end version of the Google Search Appliance, one that is delivered as a rack of servers, you get Google's technology for managing that cluster as part of the package.

However, the developers working on Nutch, an Apache Software Foundation open-source search engine, have created a distributed software environment called Hadoop that includes a distributed file system and implementation of MapReduce inspired by Google's work.

Also in this Feature:

This article was originally published on 2006-07-06
David F. Carr David F. Carr is the Technology Editor for Baseline Magazine, a Ziff Davis publication focused on information technology and its management, with an emphasis on measurable, bottom-line results. He wrote two of Baseline's cover stories focused on the role of technology in disaster recovery, one focused on the response to the tsunami in Indonesia and another on the City of New Orleans after Hurricane Katrina.David has been the author or co-author of many Baseline Case Dissections on corporate technology successes and failures (such as the role of Kmart's inept supply chain implementation in its decline versus Wal-Mart or the successful use of technology to create new market opportunities for office furniture maker Herman Miller). He has also written about the FAA's halting attempts to modernize air traffic control, and in 2003 he traveled to Sierra Leone and Liberia to report on the role of technology in United Nations peacekeeping.David joined Baseline prior to the launch of the magazine in 2001 and helped define popular elements of the magazine such as Gotcha!, which offers cautionary tales about technology pitfalls and how to avoid them.
eWeek eWeek

Have the latest technology news and resources emailed to you everyday.