Big Data: What It Is and Why You Should Care, IDC, 2011. ,
Effective Data Leak Prevention Programs: Start by Protecting Data at the Source-Your Databases, IDC, 2011. ,
Avoiding the Disk Bottleneck in the Data Domain Deduplication File System, Proc. of USENIX FAST, 2008. ,
The Digital Universe Decade-Are You Ready? White Paper, IDC, 2010. ,
Experiencing Data De-Duplication: Improving Efficiency and Reducing Capacity Requirements. White Paper. The Enterprise Strategy Group, 2007. ,
Tradeoffs in Scalable Data Routing for Deduplication Clusters, Proc. of USENIX FAST, 2011. ,
Content-aware Load Balancing for Distributed Backup, Proc. of USENIX LISA, 2011. ,
Extreme Binning: Scalable, parallel deduplication for chunk-based file backup, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, 2009. ,
DOI : 10.1109/MASCOT.2009.5366623
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.8386
HYDRAstor: a Scalable Secondary Storage, Proc. of USENIX FAST, 2009. ,
Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '07, 2007. ,
DOI : 10.1145/1281192.1281207
DEBAR: A scalable high-performance de-duplication storage system for backup and archiving, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010. ,
DOI : 10.1109/IPDPS.2010.5470468
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.359.5948
Design of an exact data deduplication cluster, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), 2012. ,
DOI : 10.1109/MSST.2012.6232380
AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment, 2011 IEEE International Conference on Cluster Computing, 2011. ,
DOI : 10.1109/CLUSTER.2011.20
Min-Wise Independent Permutations, Journal of Computer and System Sciences, vol.60, issue.3, pp.630-659, 2000. ,
DOI : 10.1006/jcss.1999.1690
URL : http://doi.org/10.1006/jcss.1999.1690
A framework for Analyzing and Improving Content-based Chunking Algorithms, 2005. ,
Silo: a Similarity-locality based Near-exact Deduplication Scheme with Low RAM Overhead and High Throughput, Proc. of USENIX ATC, 2011. ,
DOI : 10.1109/tc.2014.2308181
Cumulus, Proc. of USENIX FAST, 2009. ,
DOI : 10.1145/1629080.1629084