How Crowdsourcing Feeds Hungry Big Data Apps

By Guest Author Print this article Print
Crowdsourcing and big data

Data collection is a bottleneck for many enterprises. To deal with this issue, many firms use crowdsourcing: engaging large groups of people to provide the data.

By Bart De Lathouwer and Ron Exler

P.T. Barnum said, “Every crowd has a silver lining.” In daily life, crowds often seem undesirable—traffic jams, long lines and noise. Yet crowds can bring benefits when they work together for positive change.

Enterprises often face complicated and large-scale decision challenges for which they need to collect many data points quickly. Using big data and predictive analytics tools with cloud platforms, enterprises can now store and analyze massive amounts of data. Big data solutions also accelerate computations so that analyses that once took hours are now done in seconds. Today, data collection is the bottleneck for many enterprise decision-making processes.

To overcome the collection bottleneck, enterprises are beginning to use crowdsourcing: engaging large groups of people to provide the needed data. Crowdsourcing accelerates data collection for enterprise applications and often involves collecting and managing geospatial data.

Sometimes, people volunteer their efforts to actively provide information, while, at other times, companies collect information in the background via Website tracking, for example. However, due to positioning device inaccuracies, place-name inconsistency and variable observer skills, volunteered geographic information (VGI) requires quality authentication.

Although human verification of crowd-sourced data improves quality, it is also time-consuming and difficult to manage. Therefore, enterprises benefit from automation to supplement quality assurance efforts.

Effective use of standards can simplify the process of conflation: unifying multiple separate sources of data into one integrated, all-encompassing result. Standards also can simplify quality assurance of multiple types and sources of geospatial data when included in the automation processes.

The Open Geospatial Consortium is an international industry consortium of more than 474 companies, government agencies and universities participating in a consensus process to develop publicly available interface standards. OGC standards support interoperable solutions that "geo-enable" the Web, wireless and location-based services and mainstream IT.

Therefore, OGC standards can be central to crowdsourcing initiatives. Several successes illustrate the benefits of using OGC standards in volunteered geospatial information.

Getting Citizens Involved

The COBWEB (Citizen OBservatory WEB) is establishing a testbed environment that will enable citizens living within the Biosphere Reserves in Wales, Germany and Greece to collect environmental data using mobile devices. COBWEB collection apps on mobile phones offer observers the ability to collect and send new environmental data, such as photos of vegetation, insects and wildlife.

The apps also help address quality through robust observer authentication and metadata collection. For example, one technique useful in collection of volunteered geographic information is "interactive direction of the observer," in which the observer can be challenged with questions during the recording of observations, elevating the quality of the data.

After collection, COBWEB uses analytics to conflate crowd-sourced data with professionally gathered data to produce higher-quality data. Standards are important to data consistency, and OGC standards make it much easier to conflate different types of geospatial data, including point, raster, vector, point clouds and urban 3D models.

In addition, OGC standards such as Web Feature Service (WFS), Web Processing Service (WPS) and Web Map Service (WMS) generate geographic information systems layers from geographic point data collections. In the GIS, quality assurance analysts and perhaps even probabilistic algorithms can compare the newly observed data to existing professionally gathered data.

Crowds Populate Smart Cities

Citizen engagement is critical to improving services and living conditions. One result is that applications of crowdsourcing data collection extend into urban areas. The Netherlands and Berlin use open OGC’s CityGML—encoded 3D models that are part of their Spatial Data Infrastructures, which are good frameworks on which to collect many kinds of volunteered geographic information.

Similarly, CITI-SENSE encompasses the development of a sensor-based Citizens’ Observatory Community for improving the quality of life in cities. Started in 2012, 28 partner organizations created community-based environmental monitoring and information systems using state-of-the-art Earth observation applications. Across Europe, Israel, South Korea and Australia, CITI-SENSE contributes to the Global Earth Observation System of Systems (GEOSS), which provides common methodologies and standards for scientific approaches and data management of Earth data.

Citizens participating in CITI-SENSE use mobile sensor stations based on smartphones to help collect outdoor and indoor air-quality data. Collected data moves to a database using the open standard OGC WFS. Data processing services include the OGC WPS, which provides rules for inputs and outputs (requests and responses) for geospatial processing services, such as polygon overlay.

This article was originally published on 2015-01-15
eWeek eWeek

Have the latest technology news and resources emailed to you everyday.