Migrating a Big Data Warehouse to the Cloud

By Adam Haines

In today’s world, big data has become the norm. But transitioning big data sets from on-premises systems to a cloud-based system is cumbersome and fraught with challenges.

At Federated Sample, our business focuses on ensuring that the market research industry secures quality sample respondents for its research. The studies often have complex quotas and highly defined targeting requirements, which can make finding the right sample an involved process.

Initially, we stored our data in on-premises Microsoft SQL Server systems. However, as our company grew and requests for information increased, running queries against these transactional systems created a bottleneck. To overcome this challenge, we looked for an end-to-end data warehousing solution that would enable us to increase efficiency and more effectively meet the real-time reporting and business intelligence needs of our customers.  

Since Amazon Redshift did not require a significant capital investment and we were already an Amazon Web Services (AWS) shop, that was the logical data warehouse choice for us. But it was only part of the equation; we still needed to migrate data from our Microsoft SQL Server systems to the cloud-based Redshift data warehouse.

As we made our transition to the cloud, the following lessons became clear:

Before you start migrating to the cloud, investigate third-party solutions.

As a cost-effective measure, our team built a homegrown solution that required the use of SQL Server Integration Services (SSIS), with data traveling through multiple stages for encoding and adaptation for direct copying into Redshift. Unfortunately, the solution was labor-intensive and rife with problems: The file-encoding process was error-prone, and only small files could be sent, with no way to identify or track errors. Our initial upload took more than a week, and subsequent updates took hours.

After a month of this complex and time-consuming trial, we knew that we needed a more scalable and sound approach. What we didn’t realize initially was that we had access to the AWS ecosystem. As an example, we used Attunity CloudBeam to quickly and easily transfer data from our SQL Server systems to our Redshift data warehouse. Our entire data set loaded in hours rather than days, and it’s no longer necessary to babysit the data-transfer process.

Attunity CloudBeam and Jaspersoft, a solution we use for business intelligence, are AWS partners that played key roles in our onboarding to Redshift. Today, with the system  up and running, our data is automatically synced, providing real-time information feeds that have minimal impact on our production environment. More importantly, our account managers can run queries in Amazon Redshift and get customers the information they need quickly in order to close more business.

Automate data migration and syncing.

I can’t stress this enough: There is no need to manually migrate your data. Unless you’re working with very small data sets (under 300GB) and small transaction loads, you will regret not automating.

In our case, with our homegrown solution, I was essentially working around the clock to oversee data migration and synchronization. Based on my salary, the cost of our third-party solution paid for itself in less than a month. It was well worth the investment, since I can now focus on higher-value projects.

If possible, do a proof-of-concept (POC) with your cloud provider and other third-party solution providers.

We did a POC with Attunity to be certain that we could load our full database in a reasonable amount of time. That gave our team the confidence to move forward, and it helped us secure the budget for the project.