Data scientists from around the world are concerned about the use of ethics in big data and analytics applications, according to survey results released by Revolution Analytics, a provider of services and software for the open-source R project.
Survey results came from 144 respondents attending the 2014 Joint Statistical Meetings (JSM), an annual North American gathering of statisticians. The majority, 91 percent, felt that consumers need to be concerned about privacy issues related to data collected on them—a small increase from the 88 percent who felt that way in 2013. Another 85 percent said that ethics should play a part in research related to collecting and using data, up from 80 percent in 2013.
“In general, the responsiveness from the data scientists that ethics should play a part was to be expected,” says David Smith, chief community officer at Revolution Analytics. “The big surprise to me was the result from the Facebook manipulation study.”
In that January 2012 Facebook study, scientists changed the newsfeed of approximately 700,000 users, showing them content that portrayed either upbeat material or content with a sadder nature. Facebook researchers found that by the end of a week, users who were shown upbeat content were more likely to make happier posts, while those viewing sadder items were more likely to make negative posts.
When asked whether the Facebook study was ethical, 47 percent of the data analysts surveyed said that it was not, while 40 percent said they were not sure, and 14 percent said that the study was ethical. “I think the key thing that was missing there was transparency,” Smith says.
News of the study came out earlier in 2014 and the lack of transparency upset many people. The Washington Post is now reporting that two law professors from the University of Maryland are saying the study violated a state law there, thus giving federal protection to humans subjects involved in research in the state.
The Revolution Analytics study also showed that most data scientists agree that ethics should play a part in data research. Some 43 percent said that ethics already plays a role in their research, while 42 percent said that an industry standard should be in place. Another nine percent responded that a set of standards should be applied on a case-by-case basis, and 3 percent replied that ethics should have no place in data research at all.
Smith says it is easy to see that differences exist between research in the Internet world and the hard sciences. In the hard sciences, different types of standards and review processes are in place, including that of peer review. This is not the case for data collection and use.
However, in the hard sciences, discussion about voluntary human consent in experimentation did not even occur until the 1970s with a University of Chicago Law School paper pointing out the existence of many past abuses. This includes the Tuskegee syphilis experiment from 1932 to 1972, in which 400 black men were deprived of treatment so researchers could observe the disease taking its natural course.
Smith says that the development of an ethics framework for the collection and use of data would likely be driven from the private sector in the United States. He notes that, in Canada, a standard for privacy protection, called “Privacy by Design,” was actually proposed through the government.
This standard looks to have organizations that are producing and using big data to apply numerous principles related to privacy, including those of visibility and transparency. The resolution, spearheaded by Ontario’s Information and Privacy Commissioner, received unanimous approval in 2013 by the International Data Protection and Privacy Commissioners, which includes lawyers, professors, legal experts and others from around the world.