Skip to Main content Skip to Navigation
Other publications

AIDEme: An active learning based system for interactive exploration of large datasets

Enhui Huang 1 Luciano Palma 1 Laurent Cetinsoy 1 Yanlei Diao 1 Anna Liu 2
1 CEDAR - Rich Data Analytics at Cloud Scale
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], Inria Saclay - Ile de France
Abstract : There is an increasing gap between fast growth of data and limited human ability to comprehend data. Consequently, there has been a growing demand for analytics tools that can bridge this gap and help the user retrieve high-value content from data. We introduce AIDEme, a scalable interactive data exploration system for efficiently learning a user interest pattern over a large dataset. The system is cast in a principled active learning (AL) framework, which iteratively presents strategically selected records for user labeling, thereby building an increasingly-more-accurate model of the user interest. However, a challenge in building such a system is that existing active learning techniques experience slow convergence when learning the user interest on large datasets. To overcome the problem, AIDEme explores properties of the user labeling process and the class distribution of observed data to design new active learning algorithms, which come with provable results on model accuracy, convergence, and approximation, and have evaluation results showing much improved convergence over existing AL methods while maintaining interactive speed. In this demonstration, conference attendees will interact with AIDEme for a variety of exploration tasks on real-world datasets, enabling a better understanding of the evolution of the learned model with each labeled example, how the factorization of the user decision making process improves performance, and how the model evolves differently when various AL algorithms are used.
Complete list of metadata

Cited literature [2 references]  Display  Hide  Download
Contributor : Enhui Huang Connect in order to contact the contributor
Submitted on : Tuesday, January 7, 2020 - 3:42:32 PM
Last modification on : Thursday, January 20, 2022 - 5:33:05 PM


  • HAL Id : hal-02430750, version 1



Enhui Huang, Luciano Palma, Laurent Cetinsoy, Yanlei Diao, Anna Liu. AIDEme: An active learning based system for interactive exploration of large datasets. 2019. ⟨hal-02430750⟩



Les métriques sont temporairement indisponibles