Long-Term Data Health

What happens today in the research labs of the Roswell Park Cancer Institute (RPCI) could have an immediate, beneficial impact on the field of oncology. Then again, it might take 50 years. That is Thomas Vaughan’s dilemma.

As director of IT infrastructure at the Buffalo, N.Y.-based institution—founded in 1898 as the first cancer center in the United States Vaughanmanages information with an exceptionally long lifecycle that must also be immediately available. The data, which includes text documents, spreadsheets and diagnostic X-rays, must be stored in a way that preserves its integrity for today’s health care regulators as well as for future researchers.

“I want to save our data forever,” Vaughansays.

As they have in many organizations, the benefits and pitfalls of data storage pushed RPCI to think more strategically about what information retains value and for how long. Enterprises have always needed to record and store data, but the days of the simple ledger and filing cabinet are long gone, vanquished by three key factors. First is the sheer volume of information companies create. Second, unstructured data has largely eclipsed structured data. Records created in the line of business can range from text documents to presentations, PDFs, e-mail and more. Finally, companies face the burdensome challenge of making information accessible to regulators, auditors and lawyers.

So, what is the best way to tell which data can be deleted, saved or shuttled off to storage with slower retrieval speeds? Indeed, the discipline of data storage is all about finding proper balance.

Information lifecycle management (ILM) was conceived as a way for companies to strike that balance and get a handle on the data deluge. Simply put, ILM gives businesses a framework for classifying stored information, identifying the best technology for that storage, crafting guidelines for retention and managing the total cost of storage. ILM correlates the business value of information with the IT infrastructure that surrounds it.

There is no single path to ILM, no one-size-fits-all definition or grab-and-go tech solution. For his part, Vaughanfound a way to improve RPCI’s archiving and storage by working with the same vendor that supplies much of the institution’s infrastructure: Hewlett-Packard. From HP’s StorageWorks line he added enterprise virtual arrays and enterprise file services, a 10-TB Medical Archive Solution, a Reference Information Storage System and clustered HP ProLiant servers. “Understanding the technology,” Vaughansays, “was the easy part.”

What’s more difficult about ILM, he says, is getting folks to focus on the real value of stored data.

“[We had to understand] the change in the nature of data from just being data at rest on a disk for the previous 20 years to suddenly becoming dynamic data moving from place to place, being minable, and how that would empower the business,” he says.