FreeCore : un système d’indexation de résumés de documents sur une Table de Hachage Distribuée (DHT)

Abstract : This thesis examines the problem of indexing and searching in Distributed Hash Table (DHT). It provides a distributed system for storing document summaries based on their content. Concretely, the thesis uses Bloom filters (BF) to represent document summaries and proposes an efficient method for inserting and retrieving documents represented by BFs in an index distributed on a DHT Content-based storage has a dual advantage. It allows to group similar documents together and to find and retrieve them more quickly at the same by using Bloom filters for keywords searches. However, processing a keyword query represented by a Bloom filter is a difficult operation and requires a mechanism to locate the Bloom filters that represent documents stored in the DHT Thus, the thesis proposes in a second time, two Bloom filters indexes schemes distributed on DHT. The first proposed index system combines the principles of content-based indexing and inverted lists and addresses the issue of the large amount of data stored by content-based indexes. Indeed, by using Bloom filters with long length, this solution allows to store documents on a large number of servers and to index them using less space. Next, the thesis proposes a second index system that efficiently supports superset queries processing (keywords-queries) using a prefix tree. This solution exploits the distribution of the data and proposes a configurable distribution function that allow to index documents with a balanced binary tree. In this way, documents are distributed efficiently on indexing servers. In addition, the thesis proposes in the third solution, an efficient method for locating documents containing a set of keywords. Compared to solutions of the same category, the latter solution makes it possible to perform subset searches at a lower cost and can be considered as a solid foundation for supersets queries processing on over-dht index systems. Finally, the thesis proposes a prototype of a peer-to-peer system for indexing content and searching by keywords. This prototype, ready to be deployed in a real environment, is experimented with peersim that allowed to measure the theoretical performances of the algorithms developed throughout the thesis.
Complete list of metadatas

Cited literature [65 references]  Display  Hide  Download

https://hal.inria.fr/tel-01921587
Contributor : Bassirou Ngom <>
Submitted on : Tuesday, November 13, 2018 - 8:22:22 PM
Last modification on : Friday, July 5, 2019 - 3:26:03 PM
Long-term archiving on : Thursday, February 14, 2019 - 4:58:04 PM

File

main.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01921587, version 1

Citation

Bassirou Ngom. FreeCore : un système d’indexation de résumés de documents sur une Table de Hachage Distribuée (DHT). Recherche d'information [cs.IR]. Pierre and Marie Curie University, 2018. Français. ⟨tel-01921587⟩

Share

Metrics

Record views

164

Files downloads

185