Long-Term Data Health
What happens today in the research labs of the Roswell Park Cancer Institute (RPCI) could have an immediate, beneficial impact on the field of oncology. Then again, it might take 50 years. That is Thomas Vaughan’s dilemma.
As director of IT infrastructure at the Buffalo, N.Y.-based institution—founded in 1898 as the first cancer center in the
“I want to save our data forever,”
As they have in many organizations, the benefits and pitfalls of data storage pushed RPCI to think more strategically about what information retains value and for how long. Enterprises have always needed to record and store data, but the days of the simple ledger and filing cabinet are long gone, vanquished by three key factors. First is the sheer volume of information companies create. Second, unstructured data has largely eclipsed structured data. Records created in the line of business can range from text documents to presentations, PDFs, e-mail and more. Finally, companies face the burdensome challenge of making information accessible to regulators, auditors and lawyers.
So, what is the best way to tell which data can be deleted, saved or shuttled off to storage with slower retrieval speeds? Indeed, the discipline of data storage is all about finding proper balance.
Information lifecycle management (ILM) was conceived as a way for companies to strike that balance and get a handle on the data deluge. Simply put, ILM gives businesses a framework for classifying stored information, identifying the best technology for that storage, crafting guidelines for retention and managing the total cost of storage. ILM correlates the business value of information with the IT infrastructure that surrounds it.
There is no single path to ILM, no one-size-fits-all definition or grab-and-go tech solution. For his part,
What’s more difficult about ILM, he says, is getting folks to focus on the real value of stored data.
“[We had to understand] the change in the nature of data from just being data at rest on a disk for the previous 20 years to suddenly becoming dynamic data moving from place to place, being minable, and how that would empower the business,” he says.
Data Hierarchy Over Time
Beth Cohen, director of operations for data protection and storage consultancy Broadleaf Services, in
“ILM offers a different paradigm for companies wrestling with a tidal wave of data,” says Cohen, whose clients include a company trying to archive more than three million Microsoft PowerPoint slides.
The initial emphasis of ILM was on finding a framework for data classification; that is where the Storage Networking Industry Association’s road map still starts. But classifying data and assigning a life expectancy to its usefulness is hard work—harder still if the tools don’t match the task.
“The amount of unstructured data has been growing by leaps and bounds, and the tools were just not keeping up,” Cohen says.
To keep themselves from being overwhelmed by the data tidal wave, many organizations merely bulked up the beachhead with more storage gear. However, in a report issued earlier this summer, analyst firm Gartner predicted that, by 2010, rising costs for storage media, energy and storage facilities will compel companies “to abandon the axiom that it is easier to add storage than to craft an ILM strategy.”
The Gartner report focused on companies in the h1ealth care industry, but its author, Barry Runyon, says its conclusions hold true for other industries as well. Runyon’s recommendations include:
Slow storage growth by improving overall use.
Initiate a project to discover, identify and classify critical enterprise data, both structured and unstructured.
Establish performance and recovery objectives for each data category.
Establish formal data-retention schedules.
Deploy a storage resource management tool.
Implement a tiered-storage infrastructure with at least three tiers.
For many companies, the genesis of their ILM strategy lies in regulatory compliance. Regulators demand that certain categories of information be kept for set periods in a certain way. The companies they regulate must comply. These organizations have learned the hard way that not having a data-retention policy—or failing to follow an existing policy to the letter—can be highly damaging.
Financial-services companies, for example, were among the first to get a broad set of targeted ILM tools because they faced so many mandates on their data. The finance industry’s e-mail archiving tools are robust enough to collect missives sent from a wide variety of communications devices, and can sock away instant messaging chats, too. These tools are now making their way out to a broader audience that includes both enterprises and their legal counsel.
“Attorney review time is the most expensive part of discovery,” says Brad Harris, director of product management at electronic discovery services provider Fios. “So it’s good to have a tool that makes that more efficient.” Harris’ common-sense tips: Move information off desktops into shared storage, use a content management system and take full advantage of its metadata tagging capabilities, dispose of what you don’t need and, above all, stick with the company’s ILM plan.
“A lot of companies talk about having retention policies, but nobody follows [them],” Harris says.
ILM Gets Personal
Helen of Troy is trying to do ILM right. The
With sales topping $634 million for the fiscal year ended
As it began to craft its ILM strategy, Helen of Troy had one key advantage: Only 35 percent of the data it was looking to manage was unstructured, according to Pedro T. Contreras, vice president of information technology. The bulk of its data was structured by its worldwide enterprise resource planning (ERP) system. “ERP forces a structure on the data,” says Contreras, based in
After vetting several vendors, Helen of Troy selected enterprise data management provider Solix Technologies and its ARCHIVEjinni product. Solix is a relatively young company—it was founded in 2001 in
Helen of Troy worked through each aspect of its operations to define data-classification and information-retention policies. Contreras says the evaluation matrix that evolved covers factors such as ownership of data, how often it needs to be accessed and by whom, as well as how and where the data is stored. “You can’t break the ILM rules,” Contreras says.
You can break the power grid, however, which is why companies are increasingly looking at cutting energy consumption as part of their overall ILM strategies. Ever-rising volumes of data are running into ever-rising utility costs. By controlling power usage, a company can stretch a tight storage budget to fit more data and make itself more environmentally responsible in the process.
A Path to Greeen Storage
John Halamka, a self-professed Prius-driving vegan, is the kind of guy who would have found a path to green storage even without outside prompting. But the many hats he wears in business have made energy-efficient, long-term storage an absolute necessity.
Halamka is chief information officer for CareGroup Health System, a confederation of four Boston-area hospitals with more than 1,000 beds. He is also the
“Given that oil prices have hit $90 a barrel, telling senior management that we have to reduce our energy usage was pretty straightforward,” Halamka says. And it wasn’t just imported oil pressuring rates. The utility that CareGroup and Harvard Medical both rely on has a fixed capacity for power, heat and cooling, so when the plant gets to maximum capacity in summer months, it raises prices.
Halamka took a methodical approach to cutting storage power consumption at both of his institutions. First, he moved everything to 750-GB serial advanced technology attachment (SATA) drives. He noted in a recent blog post (geekdoctor.blogspot.com) that while many of his constituents were worried about how the slower SATA drives would perform, no one remarked on the change when it actually happened.
Then he took the eminently practical step of cutting back on how much data was stored. His backup systems now deduplicate files; if a document is attached to an e-mail sent to all 5,000 of his institutions’ employees, only one copy of the document is stored. In doing so, he managed to cut the space needed for archiving by 50 percent.
Halamka also instituted hierarchical storage management, meaning data is prioritized based on importance and the frequency of it being accessed. As files age and access becomes less critical, they migrate from a high-availability, high-speed storage area network to network-attached storage (NAS) or content-addressed storage (
And even though Halamka has been rigorous in his ILM strategy, he concedes that managing the demand for storage is difficult. “We have tried to enforce e-mail and storage quotas, but it is much easier to just increase the supply of storage than limit demand,” he says. “We’re continuing to try to strike a balance.”
Halamka, who uses a variety of
Halamka’s strategy is aggressive, but he has an ambitious goal. “I want to stay under 200 kilowatts for the next five years,” he says, “and I want to accommodate growth.”
The same can be said for ILM. It may not have all the tools it needs and is exposed to human failings, but ILM has a lot of growth to accommodate.