AIDEme: An active learning based system for interactive exploration of large datasets

Enhui Huang 1, 2, 3 Luciano Palma 1, 2, 3 Laurent Cetinsoy Yanlei Diao 1, 2, 3 Anna Liu 4
1 CEDAR - Rich Data Analytics at Cloud Scale
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], Inria Saclay - Ile de France
Abstract : There is an increasing gap between fast growth of data and limited human ability to comprehend data. Consequently, there has been a growing demand for analytics tools that can bridge this gap and help the user retrieve high-value content from data. We introduce AIDEme, a scalable interactive data exploration system for efficiently learning a user interest pattern over a large dataset. The system is cast in a principled active learning (AL) framework, which iteratively presents strategically selected records for user labeling, thereby building an increasingly-more-accurate model of the user interest. However, a challenge in building such a system is that existing active learning techniques experience slow convergence when learning the user interest on large datasets. To overcome the problem, AIDEme explores properties of the user labeling process and the class distribution of observed data to design new active learning algorithms, which come with provable results on model accuracy, convergence, and approximation, and have evaluation results showing much improved convergence over existing AL methods while maintaining interactive speed. In this demonstration, conference attendees will interact with AIDEme for a variety of exploration tasks on real-world datasets, enabling a better understanding of the evolution of the learned model with each labeled example, how the factorization of the user decision making process improves performance, and how the model evolves differently when various AL algorithms are used.
Document type :
Documents associated with scientific events
Complete list of metadatas

Cited literature [2 references]  Display  Hide  Download

https://hal.inria.fr/hal-02430750
Contributor : Enhui Huang <>
Submitted on : Tuesday, January 7, 2020 - 3:42:32 PM
Last modification on : Saturday, February 1, 2020 - 1:51:30 AM

Identifiers

  • HAL Id : hal-02430750, version 1

Citation

Enhui Huang, Luciano Palma, Laurent Cetinsoy, Yanlei Diao, Anna Liu. AIDEme: An active learning based system for interactive exploration of large datasets. NeurIPS 2019, Dec 2019, Vancouver, Canada. ⟨hal-02430750⟩

Share

Metrics

Record views

42

Files downloads

192