Inside MySpace: The Story - Second Milestone: 1-2 Million Accounts
(
Page 4 of 8 )
Second Milestone: 1-2 Million Accounts
As MySpace registration passed 1 million accounts and was closing in on 2 million, the service began knocking up against the input/output (I/O) capacity of the database servers—the speed at which they were capable of reading and writing data. This was still just a few months into the life of the service, in mid-2004. As MySpace user postings backed up, like a thousand groupies trying to squeeze into a nightclub with room for only a few hundred, the Web site began suffering from "major inconsistencies," Benedetto says, meaning that parts of the Web site were forever slightly out of date.
"A comment that someone had posted wouldn't show up for 5 minutes, so users were always complaining that the site was broken," he adds.
The next database architecture was built around the concept of vertical partitioning, with separate databases for parts of the Web site that served different functions such as the log-in screen, user profiles and blogs. Again, the Web site's scalability problems seemed to have been solved—for a while.
The vertical partitioning scheme helped divide up the workload for database reads and writes alike, and when users demanded a new feature, MySpace would put a new database online to support it. At 2 million accounts, MySpace also switched from using storage devices directly attached to its database servers to a storage area network (SAN), in which a pool of disk storage devices are tied together by a high-speed, specialized network, and the databases connect to the SAN. The change to a SAN boosted performance, uptime and reliability, Benedetto says.
{mospagebreak title=Third Milestone: 3 Million Accounts"> 
Third Milestone: 3 Million Accounts
As the Web site's growth continued, hitting 3 million registered users, the vertical partitioning solution couldn't last. Even though the individual applications on sub-sections of the Web site were for the most part independent, there was also information they all had to share. In this architecture, every database had to have its own copy of the users table—the electronic roster of authorized MySpace users. That meant when a new user registered, a record for that account had to be created on nine different database servers. Occasionally, one of those transactions would fail, perhaps because one particular database server was momentarily unavailable, leaving the user with a partially created account where everything but, for example, the blog feature would work for that person.
And there was another problem. Eventually, individual applications like blogs on sub-sections of the Web site would grow too large for a single database server.
By mid-2004, MySpace had arrived at the point where it had to make what Web developers call the "scale up" versus "scale out" decision—whether to scale up to bigger, more powerful and more expensive servers, or spread out the database workload across lots of relatively cheap servers. In general, large Web sites tend to adopt a scale-out approach that allows them to keep adding capacity by adding more servers.
But a successful scale-out architecture requires solving complicated distributed computing problems, and large Web site operators such as Google, Yahoo and Amazon.com have had to invent a lot of their own technology to make it work. For example, Google created its own distributed file system to handle distributed storage of the data it gathers and analyzes to index the Web.
In addition, a scale-out strategy would require an extensive rewrite of the Web site software to make programs designed to run on a single server run across many—which, if it failed, could easily cost the developers their jobs, Benedetto says.
So, MySpace gave serious consideration to a scale-up strategy, spending a month and a half studying the option of upgrading to 32-processor servers that would be able to manage much larger databases, according to Benedetto. "At the time, this looked like it could be the panacea for all our problems," he says, wiping away scalability issues for what appeared then to be the long term. Best of all, it would require little or no change to the Web site software.
Unfortunately, that high-end server hardware was just too expensive—many times the cost of buying the same processor power and memory spread across multiple servers, Benedetto says. Besides, the Web site's architects foresaw that even a super-sized database could ultimately be overloaded, he says: "In other words, if growth continued, we were going to have to scale out anyway."
So, MySpace moved to a distributed computing architecture in which many physically separate computer servers were made to function like one logical computer. At the database level, this meant reversing the decision to segment the Web site into multiple applications supported by separate databases, and instead treat the whole Web site as one application. Now there would only be one user table in that database schema because the data to support blogs, profiles and other core features would be stored together.
Now that all the core data was logically organized into one database, MySpace had to find another way to divide up the workload, which was still too much to be managed by a single database server running on commodity hardware. This time, instead of creating separate databases for Web site functions or applications, MySpace began splitting its user base into chunks of 1 million accounts and putting all the data keyed to those accounts in a separate instance of SQL Server. Today, MySpace actually runs two copies of SQL Server on each server computer, for a total of 2 million accounts per machine, but Benedetto notes that doing so leaves him the option of cutting the workload in half at any time with minimal disruption to the Web site architecture.
There is still a single database that contains the user name and password credentials for all users. As members log in, the Web site directs them to the database server containing the rest of the data for their account. But even though it must support a massive user table, the load on the log-in database is more manageable because it is dedicated to that function alone.