Cost and Quality Assurance in Crowdsourcing Workflows (Extended Abstract) - Archive ouverte HAL Access content directly
Conference Papers Year :

Cost and Quality Assurance in Crowdsourcing Workflows (Extended Abstract)

(1) , (2) , (2)
1
2

Abstract

Despite recent advances in artificial intelligence and machine learning, many tasks still require human contributions. With the growing availability of Internet, it is now possible to hire workers on crowdsourcing marketplaces. Many crowdsourcing platforms have emerged in the last decade: Amazon Mechanical Turk, Figure Eight 2 , Wirk 3 , etc. A platform allows employers to post tasks, that are then realized by workers hired from the crowd in exchange for some incentives [3, 19]. Common tasks include image annotation, surveys, classification, recommendation, sentiment analysis, etc. [7]. The existing platforms support simple, repetitive and independent micro-tasks which require a few minutes to an hour to complete. However, many real-world problems are not simple micro-tasks, but rather complex orchestrations of dependent tasks, that process input data and collect human expertize. Existing platforms provide interfaces to post micro-tasks to a crowd, but cannot handle complex tasks. The next stage of crowdsourcing is to build systems to specify and execute complex tasks over existing crowd platforms. A natural solution is to use workflows, i.e., orchestrations of phases that exchange data to achieve a final objective. Figure 1 is an example of complex workflow depicting the image annotation process on SPIPOLL [5], a platform to survey populations of pollinating insects. Contributors take pictures of insects that are then classified by crowdworkers. Pictures are grouped in a dataset , input to node 0. is filtered to eliminate bad pictures (fuzzy, blurred,...) in phase 0. The remaining pictures are sent to workers who try to classify them. If classification is too difficult, the image is sent to an expert. Initial classification is represented by phase 1 in the workflow, and expert classification by 2. Pictures that were discarded, classified easily or studied by experts are then assembled in a result dataset in phase , to do statistics on insect populations. Workflows alone are not sufficient to crowdsource complex tasks. Many data-centric applications come with budget and quality constraints: As human workers are prone to errors, one has to hire several workers to aggregate a final answer with sufficient confidence. An unlimited budget allows hiring large pools of workers to assemble reliable answers for each micro-task, but in general, a client for a complex task has a limited budget. This forces to replicate micro-tasks in an optimal way to achieve the best possible quality, but without exhausting the given budget. The objective is hence to obtain a reliable result, forged through a complex orchestration, at a reasonable cost. Several works consider data centric models, deployment on crowdsourcing platforms, and aggregation techniques to improve data quality (see [11] for a more complete bibliography). First, coordination of tasks has been considered in languages such as BPMN
Fichier principal
Vignette du fichier
BDAAbstract.pdf (401.26 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03482426 , version 1 (15-12-2021)

Identifiers

  • HAL Id : hal-03482426 , version 1

Cite

Loïc Hélouët, Zoltan Miklos, Rituraj Singh. Cost and Quality Assurance in Crowdsourcing Workflows (Extended Abstract). BDA 2021 - 37 eme Conférence sur la Gestion des Données - Principes, Technologies, Applications, Oct 2021, Paris, France. pp.1-2. ⟨hal-03482426⟩
18 View
17 Download

Share

Gmail Facebook Twitter LinkedIn More