Yahoo Challenge to Google Has Roots in Open Source

By David F. Carr  |  Posted 2007-08-20 Email Print this article Print
 
 
 
 
 
 
 

Initiative for distributed data processing may give the No. 2 search service some of the "geek cred" it's been lacking.

If you want to get your hands on an open source version of some of Google's core technologies, maybe you should ask Yahoo.

Yahoo has emerged as one of a major sponsor of Hadoop, an open source project that aims to replicate Google's techniques for storing and processing large amounts of data distributed across hundreds or thousands of commodity PCs (see Baseline's report: How Google Works). Last year, Hadoop project founder Doug Cutting became a Yahoo employee, and at July's Oscon open source conference he and Yahoo's director of grid computing Eric Baldeschwieler detailed how they are applying the technology.

Cutting, formerly of Excite and Xerox PARC, has founded or co-founded a series of projects related to creating an open source platform for search under the banner of the Apache Software Foundation. His work on Lucene (a Java software library for Web indexing and search) and Nutch (a search engine application that builds on Lucene) led to Hadoop, which started as a Nutch sub-project aimed at efficiently spreading the workload for compiling a search index across multiple computers. Since he doesn't work in a Yahoo office, Cutting says his employment is really more like being paid a salary to work full-time on his Apache projects and help Yahoo work efficiently with the open source community. On the other hand, he does work with Yahoo to get the most out of the technology.

The basic technique Hadoop uses is part of what has allowed Google to manage the massive data processing challenges associated with indexing the Web—and do it economically. Google has not released source code for its Google File System or the associated distributed computing environment, known as MapReduce. But what Google has done is publish academic papers on the computer science behind both—presumably knowing full well that competitors and open source programmers would be likely to create their own implementations.

In addition to giving a presentation on Hadoop at Oscon, Cutting participated in a panel discussion on new system programming and architecture techniques moderated by O'Reilly Media CEO Tim O'Reilly. While Cutting declined to speculate on Yahoo's motives for backing the project, O'Reilly called it an example of open source being "the natural ally of the number two player" in a market and a way of leveling the playing field.

In a follow-up blog post, O'Reilly wrote that Yahoo evidently wanted to make this a "coming out party" showcasing its backing of the project. "In fact, I even had a call from David Filo to make sure I knew that the support is coming from the top," he wrote. (While his co-founder Jerry Yang is better known as the public face of Yahoo, Filo is the geekier of the two and has always played a strong behind-the-scenes role in the company's technology decisions.) O'Reilly thinks Yahoo is trying to give itself "geek cred" by reaching out to the open source community with projects like Hadoop and its Yahoo Hack Day events.



123>
 
 
 
 
David F. Carr David F. Carr is the Technology Editor for Baseline Magazine, a Ziff Davis publication focused on information technology and its management, with an emphasis on measurable, bottom-line results. He wrote two of Baseline's cover stories focused on the role of technology in disaster recovery, one focused on the response to the tsunami in Indonesia and another on the City of New Orleans after Hurricane Katrina.David has been the author or co-author of many Baseline Case Dissections on corporate technology successes and failures (such as the role of Kmart's inept supply chain implementation in its decline versus Wal-Mart or the successful use of technology to create new market opportunities for office furniture maker Herman Miller). He has also written about the FAA's halting attempts to modernize air traffic control, and in 2003 he traveled to Sierra Leone and Liberia to report on the role of technology in United Nations peacekeeping.David joined Baseline prior to the launch of the magazine in 2001 and helped define popular elements of the magazine such as Gotcha!, which offers cautionary tales about technology pitfalls and how to avoid them.
 
 
 
 
 
 

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters