When On-Demand Goes Off

By Baselinemag  |  Posted 2007-01-07

A little more than 12 months ago,, the Web-based, on-demand customer relationship management (CRM) and business services provider, was hit by a series of service outages over a six-week period that affected as many as 350,000 subscribers. The incidents jeopardized's relationship with some of its own customers and potentially sidetracked efforts to extend the vendor's reach into large enterprises.

Fast-forward to mid-November 2006, when the software as a service (SaaS) company announced its third-quarter results for fiscal 2007. Revenue came to $130 million, a 57% jump over the same period in fiscal 2006; net paying subscribers were up 61,000, to 556,000; net customers up 2,300, to 27,000. That's the kind of growth that could give SaaS a good name.

Yet, the service outages—which have since been resolved— raise an issue for chief information officers to consider if they are looking to use on-demand software to deliver applications—whether it's from or any other vendor.'s problems began on Dec. 20, 2005, when its clients could not access the company's servers—and obtain customer records—for more than three hours. The problem: a rare database software bug, said the company's CEO, Marc Benioff. Though uses Oracle database software—Oracle Database 10g and Oracle Grid Computing--as key technology enablers for its service and internal operations, Benioff declined to assign any blame for the bug. However, Bruce Francis, the company's vice president of corporate strategy, said at the time, "The vendor [Oracle] has one of the largest and most sophisticated development teams on the planet, and they had never encountered this bug before. is working with [them] … to make sure that the bug doesn't crop up again." The bug hasn't reoccurred since, he said.'s vulnerabilities, however, extended beyond the bug. The company experienced two more outages in January and several in early February. These were caused by "system performance issues," the company said. Typically, such problems varied. On Feb. 9, 2005, for example, a primary hardware server failed and one of the company's four North American servers, NA1, did not automatically recover. "This required a manual restart of the NA1 database," told customers on its Web site. The outage lasted a little more than an hour.

Earlier, a Jan. 30 performance problem was attributed to a shortcoming in the company's database cluster, a collection of databases. This issue required to restart each database instance in the cluster, resulting in a pause in service. The system was down for about four hours, and even when the company brought the service back up, the application programming interface (API) remained disabled for several more hours.

Clearly, had some infrastructure issues, and with them a growing credibility gap. Its challenge was to move quickly and decisively in making fixes and enhancements, beginning with its database and service capabilities.

The outages came at the worst possible time for, which was growing rapidly--its annual revenue of $309.9 million for the year ended Jan. 31, 2006, represented an increase of 76% over the prior year. It was also seeking to attract more large enterprise customers with new software and services offerings. "They're not just a CRM company any longer, but a business services provider," says Rob Bois, a senior research analyst with AMR Research in Boston. intended to drive this change with its January 2006 launch of AppExchange, an online marketplace for applications and Web services from other vendors and developers that could be customized and integrated with's core CRM service. was also gearing up to compete more aggressively with traditional enterprise platforms. With enterprise customers concerned about trusting reliability and availability to an outsider under the SaaS model, the outage problems weren't exactly confidence builders for the targeted audience. Moreover, by its own admission, management failed to communicate problems to customers after the initial outages. It learned from its mistake, however, and soon shored up its customer communications processes—a positive move in most businesses and an essential component in the CRM on-demand arena, where unexpected downtime can cost customers millions in lost sales.

Ugly Hours

During last year's outages, was stung by a chorus of customer complaints. At least one bailed out altogether. "We dropped and switched over to an internal system on the same day as the Dec. 20 [2005] outage," says Charles Crystle, CEO of Mission Research, a developer of fund-raising software based in Lancaster, Pa. "It wasn't just the outages that caused us to change. There were performance problems as well." Among the problems, according to Crystle: a clunky interface and cryptic processes, though he declined to elaborate. has said repeatedly that since the outage, it has improved its application programming interface.

John Johnson, an assistant vice president with ASCAP, a membership association of more than 260,000 U.S. composers, songwriters, lyricists and music publishers, says his organization got hit several times by both outages and API problems. "We were just moving over to at the time," he says. "December was ugly. We were down more than up."

One disgruntled user calling himself CRMGuy went so far as to set up a blog,, to air concerns about "I am sick of all the downtime, tired of the arrogant salespeople (I feel like CS [customer service] only contacts me when they want to sell more licenses), and if I never hear or see another interview with Marc Benioff again, it will be too soon," CRMGuy railed in the blog's first entry in December 2005.

Some customers, such as Pearson Packaging Systems, a Spokane, Wash.-based manufacturer of packaging machinery, weren't affected by the services glitches. "We didn't experience any downtime," says Pearson president and CEO Michael A. Senske. Even so, Senske, a former Microsoft executive, concedes he was concerned about uptime in using

The Big Fix

Well before the outages occurred, announced it was spending $50 million on Mirrorforce, the company's effort to expand its data centers to provide real-time fail-over protection to customers. "We were investing in redesigning our entire infrastructure," says Kendall Collins, vice president of product marketing for

This was an extensive undertaking for the company. "We had to rebuild our hardware infrastructure to support [Mirrorforce], and we have rewritten every piece of software, from the application to integration, caching, database search and database management," Benioff said regarding the ambitious initiative, which was rolled out during the outages in late 2005 and early 2006.

Previously, had been serving its entire customer base from a single, third-party Web hosting facility in Silicon Valley, operated by Equinix, the global provider of network-neutral data centers; the company used SunGard Data Systems, the software and processing solutions provider, for access to a remote disaster recovery site. This system had been providing an uptime rate in excess of 99%, the company maintained.'s accelerated growth and the switch to new data centers, however, overextended the system's capabilities and contributed to the outages, according to company spokespeople.

The key element in Mirrorforce is a mirroring system that creates a duplicate database in a separate location and synchronizes the data instantaneously. In the event that one database is destroyed or disabled, the other takes over. Much of this function was built in-house.

Specifically, built new data centers on the East and West coasts, while retaining its Silicon Valley facility. Plus, it built an additional West Coast facility to support new product development. In addition, Bois says, the company has spread out its larger customers to balance its database load.

By March 2006, seemingly had beefed up its information-technology infrastructure to the point where the outages were no longer occurring. The postings on dried up. "The service drastically improved," says ASCAP's Johnson. "We've virtually had no problems."

"The service has been as close to perfect as possible," adds Wesley Benwick, CEO of Bennett's Business Systems in Jacksonville, Fla., which sells copiers and digital imaging services.

"The vast majority of our customers were not concerned and had faith in us that we were going to power through and fix this and get back to our historic high levels of reliability and availability, which we did in a very short period of time," Francis adds.

Providing Transparency

After the first outage, customers such as ASCAP's Johnson complained that inadequately communicated what was going wrong. That soon changed, however, and the company updated customers through direct communications and its customer support channels regarding fixes that were underway.

"We have … architected both a 30- and 120-day plan for changes in the service to significantly improve availability," Parker Harris, the company's co-founder and executive vice president, reported to customers at the height of the problem period. "These activities largely include upgrades to software components and installing additional hardware … This week alone, we have increased our database processing capacity by 50%. While this additional capacity is unnecessary for normal operations of the service, we believe it will help under extreme conditions like losing a data instance under peak load."

In late February, established, a Web site providing up-to-date and historical system performance information across all key system components. "We put this out there with the objective to give our entire community—press, analysts, but most importantly, our customers and our partners—perfect transparency into our system performance, our availability, the number of transactions we're delivering [often 60 million-plus transactions a day] as well as our scheduled maintenance," Collins says.

"The idea for came from a couple of different sources, one of which was an enterprise customer who said, 'We have faith in you as a reliable provider of I.T. services, but we run a big I.T. shop and we have to be accountable to our stakeholders for availability for performance,'" Francis explains. "They wanted to see that level of transparency from us."

The site tracks real-time database and service performance in terms of both API transactions and page views in the U.S., as well as in Europe, the Middle East, Australia, Africa, Southeast Asia and Japan. Green nodes mean all is OK; yellow indicates performance issues; red means a service disruption. It also provides users with a heads-up if there's a systems maintenance scheduled, which might cause minor delays in service. "It's a major benefit for us," says Bennett's Benwick. "It shows the daily performance records as well as the performance history for the month."

Though already boasts a number of large customers including ADP, Merrill Lynch and Cisco, which has 15,000 on-demand CRM subscribers, it is seeking to build traction in that market with both AppExchange and its recently announced AppStore. The latter is a full-service distribution channel for the developers—many of which are startups—that are building applications for AppExchange, which to date is offering about 400 applications (see

Still, as the market share leader with 44% of the on-demand CRM market, according to AMR, needs to keep a sharp focus on its service uptime to continue winning and retaining customers.

CRM Growth

Base Case

U.S. Headquarters: 1 Market St., Suite 300, San Francisco, CA 94105

Phone: (415) 901-7000


Business: Provides on-demand customer relationship management (CRM) products, starting at $995 a year for five users, and also enables sharing of business applications.

Chief Technology Officer: Parker Harris, executive vice president.

Financials in 2005: Revenue of $309.86 million for the fiscal year ended Jan. 31, 2006.

Challenge: Move to more robust database systems while maintaining accepted level of service for CRM on-demand customers.


  • Achieve 99.999% availability and reliability, up from 99% in 2005 and early 2006.
  • Provide processing capability to handle upward of 60 million application programming interfaces and page-view transactions daily.
  • Boost annual revenue to $710 million for fiscal year ended Jan. 31, 2007, an increase of 129% over the prior year.