Using hierarchical skills for optimized task selection in crowdsourcing

Panagiotis Mavridis

Résumé

A large number of commercial and academic participative applications rely on a crowd to acquire, disambiguate and clean data. These participative applications are widely known as crowdsourcing platforms where amateur enthusiasts are involved in real scientific or commercial projects. Requesters are outsourcing tasks by posting them on online commercial crowdsourcing platforms such as Amazon MTurk or Crowdflower. There, online participants select and perform these tasks, called microtasks, accepting a micropayment in return. These platforms face challenges such as reassuring the quality of the acquired answers, assisting participants to find relevant and interesting tasks, leveraging expert skills among the crowd, meeting tasks' deadlines and satisfying participants that will happily perform more tasks. However, related work mainly focuses on modeling skills as keywords to improve quality, in this work we formalize skills with the use a hierarchical structure, a taxonomy, that can inherently provide with a natural way to substitute tasks with similar skills. It also takes advantage of the whole crowd workforce. With extensive synthetic and real datasets, we show that there is a significant improvement in quality when someone considers a hierarchical structure of skills instead of pure keywords. On the other hand, we extend our work to study the impact of a participant’s choice given a list of tasks. While our previous solution focused on improving an overall one-to-one matching for tasks and participants we examine how participants can choose from a ranked list of tasks. Selecting from an enormous list of tasks can be challenging and time consuming and has been proved to affect the quality of answers to crowdsourcing platforms. Existing related work concerning crowdsourcing does not use either a taxonomy or ranking methods, that exist in other similar domains, to assist participants. We propose a new model that takes advantage of the diversity of the parcipant's skills and proposes him a smart list of tasks, taking into account their deadlines as well. To the best of our knowledge, we are the first to combine the deadlines of tasks into an urgency metric with the task proposition for knowledge-intensive crowdsourcing. Our extensive synthetic and real experimentation show that we can meet deadlines, get high quality answers, keep the interest of participants while giving them a choice of well selected tasks.

Des nombreuses applications participatives, commerciales et académiques se appuient sur des volontaires ("la foule") pour acquérir, désambiguiser et nettoyer des données. Ces applications participatives sont largement connues sous le nom de plates-formes de crowdsourcing où des amateurs peuvent participer à de véritables projets scientifiques ou commerciaux. Ainsi, des demandeurs sous-traitent des tâches en les proposant sur des plates-formes telles que Amazon MTurk ou Crowdflower. Puis, des participants en ligne sélectionnent et exécutent ces tâches, appelés microtasks, acceptant un micropaiement en retour. Ces plates-formes sont confrontées à des défis tels qu'assurer la qualité des réponses acquises, aider les participants à trouver des tâches pertinentes et intéressantes, tirer parti des compétences expertes parmi la foule, respecter les délais des tâches et promouvoir les participants qui accomplissent le plus de tâches. Cependant, la plupart des plates-formes ne modélisent pas explicitement les compétences des participants, ou se basent simplement sur une description en terme de mots-clés. Dans ce travail, nous proposons de formaliser les compétences des participants au moyen d'une structure hiérarchique, une taxonomie, qui permet naturellement de raisonner sur les compétences (détecter des compétences équivalentes, substituer des participants, ...). Nous montrons comment optimiser la sélection de tâches au moyen de cette taxonomie. Par de nombreuses expériences synthétiques et réelles, nous montrons qu'il existe une amélioration significative de la qualité lorsque l'on considère une structure hiérarchique de compétences au lieu de mots-clés purs. Dans une seconde partie, nous étudions le problème du choix des tâches par les participants. En effet, choisir parmi une interminable liste de tâches possibles peut s'avérer difficile et prend beaucoup de temps, et s’avère avoir une incidence sur la qualité des réponses. Nous proposons une méthode de réduction du nombre de propositions. L'état de l'art n'utilise ni une taxonomie ni des méthodes de classement. Nous proposons un nouveau modèle de classement qui tient compte de la diversité des compétences du participant et l'urgence de la tâche. À notre connaissance, nous sommes les premiers à combiner les échéances des tâches en une métrique d'urgence avec la proposition de tâches pour le crowdsourcing. Des expériences synthétiques et réelles montre que nous pouvons respecter les délais, obtenir des réponses de haute qualité, garder l'intérêt des participants tout en leur donnant un choix de tâches ciblé.

Using hierarchical skills for optimized task selection in crowdsourcing

Utilisation d'une hiérarchie de compétences pour l'optimisation de sélection de tâches en crowdsourcing

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager