Summary: Answers.com uses blade servers, open-source software and other tech tools to craft a robust Internet infrastructure for dispatching on-demand information to millions of users each month. In the highly competitive Internet search segment, success is measured in microseconds, and slow-loading pages can cost millions in lost advertising revenue. Dan Marriott, director of operations at the New York-based Q&A community site, tells how a “zero single point of failure” environment ensures both speed and reliability.
As director of operations at the world’s leading Q&A community site, I am responsible for providing a lot of answers—literally. Answers.com delivers on-demand information to tens of millions of unique visitors each month and must do so as fast as possible because Internet users do not like waiting, and latency measured in seconds translates into significant losses.
With these demands in mind, we designed the Web infrastructure and content serving systems for Answers.com and other Web properties in our network to be blazingly fast, highly reliable and extremely scalable. As key management at a publicly traded company, we have a fiduciary responsibility to ensure that Answers.com never goes down.
Answers.com constitutes the world’s largest pool of community-supplied Q&A information combined with authoritative reference information. In total, Answers.com sites maintain a database of 9 million community-supplied answers to questions covering topics that run the gamut from travel to technology. Recently, Answers.com reached 5 million registered users.
In the highly competitive Internet Q&A and information search segment, success is measured in microseconds, and slow-loading pages can cost millions in lost advertising revenues. Naturally, we have put serious thought into the Internet infrastructure behind the Answers.com network. Through trial, error and hundreds of hours of testing, we built out a custom application stack designed for screamingly fast content delivery to anywhere in the world at any time, in a manner that accommodates any type of network connection.
But providing fast content delivery is not enough. We also must have a fully redundant architecture that minimizes the chance of catastrophic technology collapses. For that reason, we operate a “zero single point of failure” environment. Every server in our network infrastructure has a redundant counterpart. We run redundant power supplies, redundant switches and load balancers, redundant firewalls and redundant rack chassis clusters.
We checked out various blade providers before going with Hewlett-Packard as our primary server vendor. A key factor in the decision involved the features HP incorporated into its blades. For example, we use solid-state drives fairly extensively in our database tier, and HP was the first to offer PCI-based SSDs that could be incorporated within the blade form factor. It altered our whole approach to our database tier.
Another positive was the ability to add two- and four-port network interface cards, a key functional requirement for us. Based on the range of operational expenses—including power consumption, cooling, space/rack requirements, support and maintenance, and system administration—we found that the HP products offered better than 20 percent cost savings compared with other servers at that time.
For colocation, we use two facilities near the East and West Coasts. Each colocation server cluster provides sufficient CPU and bandwidth capacity to handle at least 120 percent of our peak traffic requirements. This gives us a total burst capacity of well over 200 percent of estimated peak traffic.
To speed that content on its way, we also deploy Varnish, an open-source HTTP caching application, in front of our LAMP-stack infrastructure. It also supports an essential requirement: to purge specific pages from cache any time they are updated.
On the client side, Answers.com uses now-standard practices to reduce load times, including deployment of CSS Sprites, combined files and image maps. Our engineering team uses Asynchronous Java-Script and XML (AJAX) wherever possible to let users down-load content in the background that can be quickly displayed.