Skip to Main content Skip to Navigation
Conference papers

STrieGD: A Sampling Trie Indexed Compression Algorithm for Large-Scale Gene Data

Abstract : The development of next-generation sequencing (NGS) technology presents a considerable challenge for data storage. To address this challenge, a number of compression algorithms have been developed. However, currently used algorithms fail to simultaneously achieve high compression ratio as well as high compression speed. We propose an algorithm STrieGD that is based on a trie index structure for improving the compression speed of FASTQ files. To reduce the size of the trie index structure, our approach adopts a sampling strategy followed by a filtering step using quality scores. Our experiment shows that the compression ratio of our algorithm increased by approx. 50% over GZip, while being nearly equal to that of DSRC. Importantly, the compression speed of the STrieGD is 3 to 6 times faster than GZip and about 55% faster than DSRC. Moreover, with the increase of compressors, the compression ratio remains stable and the compression speed is nearly linear scalable.
Document type :
Conference papers
Complete list of metadata

Cited literature [11 references]  Display  Hide  Download

https://hal.inria.fr/hal-02279552
Contributor : Hal Ifip <>
Submitted on : Thursday, September 5, 2019 - 1:31:10 PM
Last modification on : Thursday, September 5, 2019 - 1:35:34 PM
Long-term archiving on: : Thursday, February 6, 2020 - 1:48:53 AM

File

477597_1_En_3_Chapter.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Yanzhen Gao, Xiaozhen Bao, Jing Xing, Zheng Wei, Jie Ma, et al.. STrieGD: A Sampling Trie Indexed Compression Algorithm for Large-Scale Gene Data. 15th IFIP International Conference on Network and Parallel Computing (NPC), Nov 2018, Muroran, Japan. pp.27-38, ⟨10.1007/978-3-030-05677-3_3⟩. ⟨hal-02279552⟩

Share

Metrics

Record views

48

Files downloads

11