Graph sampling with applications to estimating the number of pattern embeddings and the parameters of a statistical relational model

Irma Ravkic 1 Martiň Znidaršič 2 Jan Ramon 3 Jesse Davis 1
3 MAGNET - Machine Learning in Information Networks
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : Counting the number of times a pattern occurs in a database is a fundamental data mining problem. It is a subroutine in a diverse set of tasks ranging from pattern mining to supervised learning and probabilistic model learning. While a pattern and a database can take many forms, this paper focuses on the case where both the pattern and the database are graphs (networks). Unfortunately , in general, the problem of counting graph occurrences is #P-complete. In contrast to earlier work, which focused on exact counting for simple (i.e., very short) patterns, we present a sampling approach for estimating the statistics of larger graph pattern occurrences. We perform an empirical evaluation on synthetic and real-world data that validates the proposed algorithm, illustrates its practical behavior and provides insight into the trade-off between its accuracy of estimation and computational efficiency.
Complete list of metadatas

Cited literature [32 references]  Display  Hide  Download

https://hal.inria.fr/hal-01725971
Contributor : Jan Ramon <>
Submitted on : Wednesday, March 7, 2018 - 4:48:38 PM
Last modification on : Monday, July 8, 2019 - 3:30:08 PM
Long-term archiving on : Friday, June 8, 2018 - 2:54:41 PM

File

ravkic-dami18.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01725971, version 1

Citation

Irma Ravkic, Martiň Znidaršič, Jan Ramon, Jesse Davis. Graph sampling with applications to estimating the number of pattern embeddings and the parameters of a statistical relational model. Data Mining and Knowledge Discovery, Springer, In press, pp.36. ⟨hal-01725971⟩

Share

Metrics

Record views

244

Files downloads

209