Skip to Main content Skip to Navigation
Theses

Frequent Itemset Sampling of High Throughput Streams on FPGA Accelerators

Mael Gueguen 1, 2
1 CAIRN - Energy Efficient Computing ArchItectures with Embedded Reconfigurable Resources
Inria Rennes – Bretagne Atlantique , IRISA-D3 - ARCHITECTURE
2 LACODAM - Large Scale Collaborative Data Mining
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : The field of frequent pattern mining aims to discover recurring patterns from a given database. Many pattern mining approaches are available in the scientific literature, yet most of them suffer from the same drawback: there can be many output results, which contain highly redundant information. This makes such results hard to analyze. A technique called output space sampling has recently being used along frequent pattern mining for this very reason. Output space sampling consists in returning a bounded sample of the results, with statistical guarantees that ensure it is representative of the complete output. In a field where fast adaptation to trends is prevalent, an imperfect real-time analysis can be preferable over exhaustive offline analysis. To this aim, the thesis focuses its work on dedicated hardware architectures, more energy and time efficient than commonly used servers. The first contribution of the thesis is a frequent pattern mining accelerator for FPGA architectures. The proposed solution allow for a greater architectural flexibility, while reducing the cost of on-Chip memory, a scarce resource for the architecture. This first contribution proposes algorithmic improvements, to allow for a regularisation of the explored research space suited for efficient computing on FPGA. Furthermore, we propose an FPGA accelerator able to manage the heavy load of communication with its external memory. The second contribution extends the first one, restricted to static databases, to streaming databases. This requires to reconsider the theoretical basis of the sampling technique, as the value of the sample must be representative of the most recent snapshot of the stream, but also of the important trends in the close past of the stream.
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03120148
Contributor : Olivier Sentieys <>
Submitted on : Monday, January 25, 2021 - 12:42:49 PM
Last modification on : Thursday, April 8, 2021 - 9:05:26 AM
Long-term archiving on: : Monday, April 26, 2021 - 6:55:32 PM

File

Manuscrit_MAEL_GUEGUEN_Version...
Files produced by the author(s)

Identifiers

  • HAL Id : tel-03120148, version 1

Citation

Mael Gueguen. Frequent Itemset Sampling of High Throughput Streams on FPGA Accelerators. Embedded Systems. Université de Rennes 1 (UR1), 2020. English. ⟨tel-03120148v1⟩

Share

Metrics

Record views

33

Files downloads

29