Skip to Main content Skip to Navigation

Data Centric Workflows for Crowdsourcing Application

Rituraj Singh 1, 2, 3 
2 SUMO - SUpervision of large MOdular and distributed systems
Inria Rennes – Bretagne Atlantique , IRISA-D4 - LANGAGE ET GÉNIE LOGICIEL
Abstract : Crowdsourcing uses human intelligence to solve tasks which are still difficult for machines. Tasks at existing crowdsourcing platform are batches of relatively simple microtasks. However, real-world problems are often more difficult than micro-tasks. They require data collection, organization, pre-processing, analysis, and synthesis of results. In this thesis, we study how to specify complex crowdsourcing tasks and realize them with the help of existing crowdsourcing platforms. The first contribution of this thesis is a complex workflows model that provides high-level constructs to describe a complex task through orchestration of simpler tasks. We provide algorithms to check termination and correctness of a complex workflow for a subset of the language (these questions are undecidable in the general case). A well-known drawback of crowdsourcing is that human answers might be wrong. To leverage this problem, crowdsourcing platforms replicate tasks, and forge a final trusted answer out of the produced results. Replication increases quality of data, but it is costly. The second contribution of this thesis is a set of aggregation techniques where merging of answers is realized using Expectation Maximization, and replication of tasks is performed online after considering the confidence estimated for aggregated data. Experimental results show that these techniques allow to aggregate the returned answers while achieving a good trade-off between cost and data quality, both for the realization of a batches of microtasks, and of complex workflow.
Complete list of metadata
Contributor : Loic Helouet Connect in order to contact the contributor
Submitted on : Wednesday, June 30, 2021 - 2:58:16 PM
Last modification on : Monday, April 4, 2022 - 9:28:31 AM
Long-term archiving on: : Friday, October 1, 2021 - 6:37:56 PM


Files produced by the author(s)


  • HAL Id : tel-03274867, version 1


Rituraj Singh. Data Centric Workflows for Crowdsourcing Application. Formal Languages and Automata Theory [cs.FL]. Université de Rennes 1, 2021. English. ⟨tel-03274867⟩



Record views


Files downloads