Scalable Computation of Fuzzy Joins Over Large Collections of JSON Data - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Scalable Computation of Fuzzy Joins Over Large Collections of JSON Data

Résumé

Fuzzy joins are widely used in a variety of data analysis applications such as data integration, data mining, and master data management. In the context of Big Data, computing fuzzy joins is challenging due to the high computational cost required and the communication cost. While on one hand big fuzzy joins on relational data and on the other hand joins on tree-structured data have been investigated in the literature, to the best of our knowledge, combining the two is still an open problem. In this context, we study methods for leveraging distributed environments in order to compute fuzzy joins over large collections of JSON documents. Our algorithms take into account both the text-similarity of the joining data, as well as its structural similarity.
Fichier non déposé

Dates et versions

hal-04354170 , version 1 (19-12-2023)

Identifiants

Citer

Remi Uhartegaray, Laurent d'Orazio, Matthew Damigos, Eleftherios Kalogeros. Scalable Computation of Fuzzy Joins Over Large Collections of JSON Data. 2023 IEEE International Conference on Fuzzy Systems (FUZZ), Aug 2023, Incheon, South Korea. ⟨10.1109/fuzz52849.2023.10309759⟩. ⟨hal-04354170⟩
21 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More