Infrastructure - Baseline
Home arrow Infrastructure arrow Page 5 - How Google Works













Renew Your Subscription

Infrastructure



How Google Works



By David F. Carr

  Table of Contents:
  1. How Google Works
  2. How Google Works: What CIOs Can Learn
  3. How Google Works: The Start of the Story
  4. How Google Works: A Role Model
  5. How Google Works: The Google File System
  6. How Google Works: Reducing Complexity
  7. How Google Works: Google's Secrets
  8. How Google Works: Exotic but Not Unique
  9. How Google Works: Google, the Enterprise
  10. How Google Works: Google Base Case

For all the razzle-dazzle surrounding Google, the company must still work through common business problems such as reporting revenue and tracking projects. But it sometimes addresses those needs in unconventional—yet highly efficient—ways. Other

Rate This Article:
Add This Article To:

How Google Works - How Google Works: The Google File System


( Page 5 of 10 )

The Google File System

In 2003, Google's research arm, Google Labs, published a paper on the Google File System (GFS), which appears to be a successor to the BigFiles system Page and Brin wrote about back at Stanford, as revamped by the systems engineers they hired after forming Google. The new document covered the requirements of Google's distributed file system in more detail, while also outlining other aspects of the company's systems such as the scheduling of batch processes and recovery from subsystem failures.

The idea is to "store data reliably even in the presence of unreliable machines," says Google Labs distinguished engineer Jeffrey Dean, who discussed the system in a 2004 presentation available by Webcast from the University of Washington.

For example, the GFS ensures that for every file, at least three copies are stored on different computers in a given server cluster. That means if a computer program tries to read a file from one of those computers, and it fails to respond within a few milliseconds, at least two others will be able to fulfill the request. Such redundancy is important because Google's search system regularly experiences "application bugs, operating system bugs, human errors, and the failures of disks, memory, connectors, networking and power supplies," according to the paper.

The files managed by the system typically range from 100 megabytes to several gigabytes. So, to manage disk space efficiently, the GFS organizes data into 64-megabyte "chunks," which are roughly analogous to the "blocks" on a conventional file system—the smallest unit of data the system is designed to support. For comparison, a typical Linux block size is 4,096 bytes. It's the difference between making each block big enough to store a few pages of text, versus several fat shelves full of books.

To store a 128-megabyte file, the GFS would use two chunks. On the other hand, a 1-megabyte file would use one 64-megabyte chunk, leaving most of it empty, because such "small" files are so rare in Google's world that they're not worth worrying about (files more commonly consume multiple 64-megabyte chunks).

A GFS cluster consists of a master server and hundreds or thousands of "chunkservers," the computers that actually store the data. The master server contains all the metadata, including file names, sizes and locations. When an application requests a given file, the master server provides the addresses of the relevant chunkservers. The master also listens for a "heartbeat" from the chunkservers it manages—if the heartbeat stops, the master assigns another server to pick up the slack.

In technical presentations, Google talks about running more than 50 GFS clusters, with thousands of servers per cluster, managing petabytes of data.

More recently, Google has enhanced its software infrastructure with BigTable, a super-sized database management system it developed, which Dean described in an October presentation at the University of Washington. Big Table stores structured data used by applications such as Google Maps, Google Earth and My Search History. Although Google does use standard relational databases, such as MySQL, the volume and variety of data Google manages drove it to create its own database engine. BigTable database tables are broken into smaller pieces called tablets that can be stored on different computers in a GFS cluster, allowing the system to manage tables that are too big to fit on a single server.

Also in this Feature:

  • The People Who Power Google
  • Google Courts the Enterprise
  • How Google Manages a Global Workforce


  •  
     
    >>> More Infrastructure Articles          >>> More By David F. Carr
     


    Sponsored Links
  • Get up and running in as quickly as 30 days with BI. Learn how today.

  • FREE Securing Smartphones & Tablets for Dummies Book from Sophos
  • 5 New Technologies That Will Change Enterprise ITAdvertisement
  • Build an IT Infrastructure That Delivers the Future
     
  •  
    FEATURED SPONSORED ARTICLES

    FEATURED SPONSORED VIDEOS

     



    LATEST STORIES


     

     


    Advertisement
    rss graphic
           Baseline Newsletters