Using Content Delivery NetworksBy Dan Marriott | Posted 2011-04-06 Email Print
To ensure rapid responses to information requests, Answers.com constructed a lightning-fast, fully redundant and highly scalable Web architecture.
Using Content Delivery Networks
To push the network’s static content further out toward the client side, we’ve extensively used content delivery networks (CDNs). We dynamically generate HTML pages ourselves, but all the other page components (GIFs, JPEGs, PNGs,
JS and CSS) are much more static, so we moved them off our servers and onto the CDNs to be closer to users.
In the past, we used a variety of CDNs, but we switched to Cotendo, which focuses on optimizing its caching software and improving performance using fewer resources. Since Answers.com began serving traffic through Cotendo, we’ve experienced 99.999 percent uptime, versus 99.9 percent uptime with our previous CDN vendors—44 minutes less downtime per month.Other features include hourly traffic log data dumps via automated FTP delivery, which lets us quickly spot traffic trends.
We’re exploring using Cotendo’s search engine optimization (SEO) tools, which would give us near-real-time insights into the search engines’ spidering experiences. We think this tool can highlight problems quickly before they become serious traffic and site-ranking issues, enabling us to make quick adjustments.
The software core of Answers.com’s data infrastructure is built primarily on the open-source LAMP stack with a heavy dose of virtualization. We have become comfortable running several layers of our mission-critical Web serving in a completely virtualized environment. Several of our layers and all our Apache PHP run on VMware. Virtualization is now part of our operations infrastructure DNA.
As long-time MySQL proponents, we now use this open-source software as our primary database in development, test, staging and production environments. Other data stores we’ve used include Memcached, a distributed memory caching system that enhances site performance by caching data and objects in memory and SSDs to reduce external data source calls to servers.
We also use Cassandra, an open-source NoSQL database that Digg and Facebook use to store and manipulate large volumes of data distributed across multiple commodity servers. For critical search capabilities, Answers.com uses Solr/Lucene, an open-source enterprise search platform from the Apache Lucene project that offers highly scalable full-text search, faceted search, dynamic clustering and simple database integration.
Dealing With Ads
No discussion of maintaining a fast Web architecture would be complete without considering advertisements. Every Web operations director has horror stories about a poorly coded Flash ad spot that took 10 seconds or longer to load and chased off thousands of site visitors before it was spotted as a serious performance hazard.
Answers.com avoids highly disruptive advertising units like interstitials and page takeovers because we don’t like to put our users through that. We select our ad network partners carefully and spend time checking out their technology and talking to other Website operators who have served the network’s ads. In fact, we deal only with ad networks that, like Answers.com, have highly reliable and redundant server environments that serve their traffic through reliable networks.
That said, some badly coded or resource-hogging ads do slip through. We know that it can be incredibly annoying when one component holds up the page from displaying in the browser. To minimize that problem, Answers.com places a higher load priority on editorial content and ensures that ads are loaded later. We want to make sure that the core content of the page always loads first.
Above all, we want our users to have a great experience on our sites, and we’ll do whatever it takes to ensure they get the information they want as quickly as possible and with minimal interruption or wait times. We believe that this attitude, along with our willingness to invest extra time and effort in R&D, has played a key role in growing Answers.com to nearly 70 million monthly unique visitors. Our distribution system’s flexible and global nature has allowed us to grow quickly around the world.
Maintaining our high ranking in the fast-changing global Internet site network is no simple feat. But I think we have the answer: a well-constructed combination of smart hardware, open-source software, feature-rich management systems, fast and resilient distribution systems, and, most important, lots of smart people on our team.
Dan Marriott is director of operations at New York-based Answers.com. As a technologist with more than 20 years’ professional experience in the private and public sectors, he now specializes in high scalability and performance, pushing MySQL database limits, Internet infrastructure and data security.