Skip to Main content Skip to Navigation
Conference papers

D-JB: An Online Join Method for Skewed and Varied Data Streams

Abstract : Scalable distributed join processing in a parallel environment requires a partitioning policy to transfer data. Online theta-joins over data streams are more computationally expensive and impose higher memory requirement in distributed data stream management systems (DDSMS) than database management systems (DBMS). The complete bipartite graph-based model can support distributed stream joins, and has the characteristics of memory-efficiency, elasticity and scalability. However, due to the instability of data stream rate and the imbalance of attribute value distribution, the online theta-joins over skewed and varied streams lead to the load imbalance of cluster. In this paper, we present a framework D-JB (Dynamic Join Biclique) for handling skewed and varied streams, enhancing the adaptability of the join model and minimizing the system cost based on the varying workloads. Our proposal includes a mixed key-based and tuple-based partitioning scheme to handle skewed data in each side of the bipartite graph-based model, a strategy for redistribution of query nodes in two sides of this model, and a migration algorithm about state consistency to support full-history joins. Experiments show that our method can effectively handle skewed and varied data streams and improve the throughput of DDSMS.
Document type :
Conference papers
Complete list of metadata

Cited literature [12 references]  Display  Hide  Download

https://hal.inria.fr/hal-02118799
Contributor : Hal Ifip <>
Submitted on : Friday, May 3, 2019 - 1:24:25 PM
Last modification on : Friday, May 3, 2019 - 3:50:43 PM
Long-term archiving on: : Thursday, October 3, 2019 - 8:34:50 AM

File

474230_1_En_12_Chapter.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Chunkai Wang, Jian Feng, Zhongzhi Shi. D-JB: An Online Join Method for Skewed and Varied Data Streams. 2nd International Conference on Intelligence Science (ICIS), Nov 2018, Beijing, China. pp.115-125, ⟨10.1007/978-3-030-01313-4_12⟩. ⟨hal-02118799⟩

Share

Metrics

Record views

51

Files downloads

2