Cost and Quality Assurance in Crowdsourcing Workflows (Extended Abstract)

Loïc Hélouët; Zoltan Miklos; Rituraj Singh

Communication Dans Un Congrès Année : 2021

Cost and Quality Assurance in Crowdsourcing Workflows (Extended Abstract)

(1) , (2) , (2)

1
2

Loïc Hélouët

Fonction : Auteur
PersonId : 830540

SUpervision of large MOdular and distributed systems

Zoltan Miklos

Fonction : Auteur

Declarative & Reliable management of Uncertain, user-generated Interlinked Data

Rituraj Singh

Fonction : Auteur
PersonId : 1041406

Declarative & Reliable management of Uncertain, user-generated Interlinked Data

Résumé

Despite recent advances in artificial intelligence and machine learning, many tasks still require human contributions. With the growing availability of Internet, it is now possible to hire workers on crowdsourcing marketplaces. Many crowdsourcing platforms have emerged in the last decade: Amazon Mechanical Turk, Figure Eight 2 , Wirk 3 , etc. A platform allows employers to post tasks, that are then realized by workers hired from the crowd in exchange for some incentives [3, 19]. Common tasks include image annotation, surveys, classification, recommendation, sentiment analysis, etc. [7]. The existing platforms support simple, repetitive and independent micro-tasks which require a few minutes to an hour to complete. However, many real-world problems are not simple micro-tasks, but rather complex orchestrations of dependent tasks, that process input data and collect human expertize. Existing platforms provide interfaces to post micro-tasks to a crowd, but cannot handle complex tasks. The next stage of crowdsourcing is to build systems to specify and execute complex tasks over existing crowd platforms. A natural solution is to use workflows, i.e., orchestrations of phases that exchange data to achieve a final objective. Figure 1 is an example of complex workflow depicting the image annotation process on SPIPOLL [5], a platform to survey populations of pollinating insects. Contributors take pictures of insects that are then classified by crowdworkers. Pictures are grouped in a dataset , input to node 0. is filtered to eliminate bad pictures (fuzzy, blurred,...) in phase 0. The remaining pictures are sent to workers who try to classify them. If classification is too difficult, the image is sent to an expert. Initial classification is represented by phase 1 in the workflow, and expert classification by 2. Pictures that were discarded, classified easily or studied by experts are then assembled in a result dataset in phase , to do statistics on insect populations. Workflows alone are not sufficient to crowdsource complex tasks. Many data-centric applications come with budget and quality constraints: As human workers are prone to errors, one has to hire several workers to aggregate a final answer with sufficient confidence. An unlimited budget allows hiring large pools of workers to assemble reliable answers for each micro-task, but in general, a client for a complex task has a limited budget. This forces to replicate micro-tasks in an optimal way to achieve the best possible quality, but without exhausting the given budget. The objective is hence to obtain a reliable result, forged through a complex orchestration, at a reasonable cost. Several works consider data centric models, deployment on crowdsourcing platforms, and aggregation techniques to improve data quality (see [11] for a more complete bibliography). First, coordination of tasks has been considered in languages such as BPMN

Domaines

Théorie et langage formel [cs.FL]

Fichier principal

BDAAbstract.pdf (401.26 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Loic Helouet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03482426

Soumis le : mercredi 15 décembre 2021-22:43:48

Dernière modification le : mardi 12 décembre 2023-09:55:33

Archivage à long terme le : mercredi 16 mars 2022-19:45:26

Dates et versions

hal-03482426 , version 1 (15-12-2021)

Identifiants

HAL Id : hal-03482426 , version 1

Citer

Loïc Hélouët, Zoltan Miklos, Rituraj Singh. Cost and Quality Assurance in Crowdsourcing Workflows (Extended Abstract). BDA 2021 - 37 eme Conférence sur la Gestion des Données - Principes, Technologies, Applications, Oct 2021, Paris, France. pp.1-2. ⟨hal-03482426⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM CYBERSCHOOL

26 Consultations

27 Téléchargements

Cost and Quality Assurance in Crowdsourcing Workflows (Extended Abstract)

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager