R. L. Villars, C. W. Olofson, and M. Eastwood, Big Data: What It Is and Why You Should Care, IDC, 2011.

C. J. Kolodg, Effective Data Leak Prevention Programs: Start by Protecting Data at the Source-Your Databases, IDC, 2011.

B. Zhu, K. Li, and H. Patterson, Avoiding the Disk Bottleneck in the Data Domain Deduplication File System, Proc. of USENIX FAST, 2008.

J. Gantz and D. Reinsel, The Digital Universe Decade-Are You Ready? White Paper, IDC, 2010.

H. Biggar, Experiencing Data De-Duplication: Improving Efficiency and Reducing Capacity Requirements. White Paper. The Enterprise Strategy Group, 2007.

W. Dong, F. Douglis, K. Li, H. Patterson, S. Reddy et al., Tradeoffs in Scalable Data Routing for Deduplication Clusters, Proc. of USENIX FAST, 2011.

F. Douglis, D. Bhardwaj, H. Qian, and P. Shilane, Content-aware Load Balancing for Distributed Backup, Proc. of USENIX LISA, 2011.

D. Bhagwat, K. Eshghi, D. D. Long, and M. Lillibridge, Extreme Binning: Scalable, parallel deduplication for chunk-based file backup, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, 2009.
DOI : 10.1109/MASCOT.2009.5366623

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.8386

C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian et al., HYDRAstor: a Scalable Secondary Storage, Proc. of USENIX FAST, 2009.

D. Bhagwat, K. Eshghi, and P. Mehra, Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '07, 2007.
DOI : 10.1145/1281192.1281207

T. Yang, H. Jiang, D. Feng, Z. Niu, K. Zhou et al., DEBAR: A scalable high-performance de-duplication storage system for backup and archiving, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010.
DOI : 10.1109/IPDPS.2010.5470468

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.359.5948

H. Kaiser, D. Meister, A. Brinkmann, and S. Effert, Design of an exact data deduplication cluster, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), 2012.
DOI : 10.1109/MSST.2012.6232380

Y. Fu, H. Jiang, N. Xiao, L. Tian, and F. Liu, AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment, 2011 IEEE International Conference on Cluster Computing, 2011.
DOI : 10.1109/CLUSTER.2011.20

A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher, Min-Wise Independent Permutations, Journal of Computer and System Sciences, vol.60, issue.3, pp.630-659, 2000.
DOI : 10.1006/jcss.1999.1690

URL : http://doi.org/10.1006/jcss.1999.1690

K. Eshghi and H. K. Tang, A framework for Analyzing and Improving Content-based Chunking Algorithms, 2005.

W. Xia, H. Jiang, D. Feng, and Y. Hua, Silo: a Similarity-locality based Near-exact Deduplication Scheme with Low RAM Overhead and High Throughput, Proc. of USENIX ATC, 2011.
DOI : 10.1109/tc.2014.2308181

M. Vrable, S. Savage, and G. M. Voelker, Cumulus, Proc. of USENIX FAST, 2009.
DOI : 10.1145/1629080.1629084