By Devavrat Shah
The title “data scientist” was coined in 2008, and there are already thousands of professionals working in the field. Still, many organizations—even data scientists themselves—struggle to define what a data scientist is and what he or she does.
Part of the problem is that, even in the short time data scientists have been around, the definition has changed and continues to evolve in our big data world. Being a data scientist in the past required only math and statistics capabilities. Today, the role encompasses a unique combination of skills ranging from data engineer to statistician to business analyst. In other words, a data scientist career has become several jobs rolled into one.
Complicating matters, we are about to enter a new era of the Information Age, in which data sets will grow at an exponential rate due to new tracking mechanisms applied to everything from smartphones and televisions to online shopping and social media. In the coming years, big data will become bigger, faster and more complex.
Data scientists will be challenged to convert this increasing volume, velocity and variety of data into meaningful insights on a massive scale—in real time. This will involve more intricate predictions and computations at scale, which in turn will spark the need for next-generation data scientists.
What do future data scientists look like? These men and women will be well-rounded professionals with both technical proficiency and business acumen, along with a mastery of statistics and dashes of programming, engineering and social sciences. They will be capable of tackling all aspects of big data problems, from data collection to analysis, interpretation and decision making.
To be successful, data scientists will need to learn a host of new and different skills. Becoming familiar with tools such as Python and Hadoop will be a priority. Techniques such as machine learning and data mining will be essential as well.
Data scientists will not only have to manage and analyze data, but they must also understand the business implications, communicate results and understand how data insights can be applied effectively to drive decisions. This requires not only considerable programming and computing knowledge, but business capabilities as well.
Data Science Is Morphing Into a Multidisciplinary Field
The fact is, data science is morphing into a multidisciplinary field with elements of data science, information and decision systems, and social sciences, as well as connections to engineering domains. Professionals who remain in the field must further hone their abilities if they want to be successful and address the big data challenges ahead.
This reality is driving a number of universities to design programs to train and develop future data scientists. The most successful programs will combine interdisciplinary efforts in a rigorous way.
But aside from building knowledge and know-how in the various disciplines and subdisciplines, data scientists should also study what is happening in industries outside their areas of expertise. Examining broad trends and recognizing patterns can help them determine which tools and technologies are priorities to learn about.
After all, the key isn’t knowing only one technology, model or practice. Professionals should be well-versed with respect to a variety of tools, perspectives and approaches, so they can identify the methods and models most appropriate for use in a specific case.
Big data, if harnessed correctly, is opening new horizons for businesses and is sure to play a critical role in organizations across every industry and around the world. Data scientists are the key to delivering on this promise, but to succeed, they will need to evolve and adapt to changing business demands and new technologies. Ongoing education can help them acquire the skills necessary for their new role, as these skills become more and more central to success in business.
So what is the new definition of a data scientist? In my view, it’s a person who can do it all: a new breed of innovator.
Devavrat Shah, co-director of the “Data Science: Data to Insights” course, is a professor in MIT’s Department of Electrical Engineering and Computer Science, director of the Statistics and Data Science Center, and a core faculty member at MIT’s Institute for Data, Systems and Society. He is also a member of MIT’s Laboratory for Information and Decision Systems and the MIT Operations Research Center.