C. L. Abad, N. Roberts, Y. Lu, and R. H. Campbell, A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns, 2012 IEEE International Symposium on Workload Characterization (IISWC), pp.100-109, 2012.
DOI : 10.1109/IISWC.2012.6402909

S. Al-kiswany, D. Subhraveti, P. Sarkar, and M. Ripeanu, VMFlock, Proceedings of the 20th international symposium on High performance distributed computing, HPDC '11, pp.159-170, 2011.
DOI : 10.1145/1996130.1996153

B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny, Workload analysis of a large-scale key-value store, SIGMETRICS '12: Proceedings of the 12th ACM Joint International Conference on Measurement and Modeling of Computer Systems, pp.53-64, 2012.

H. Chen, M. Kim, Z. Zhang, and H. Lei, Empirical study of application runtime performance using on-demand streaming virtual disks in the cloud, Proceedings of the Industrial Track of the 13th ACM/IFIP/USENIX International Middleware Conference on, MIDDLEWARE '12, pp.1-5, 2012.
DOI : 10.1145/2405146.2405151

U. Deshpande, X. Wang, and K. Gopalan, Live gang migration of virtual machines, Proceedings of the 20th international symposium on High performance distributed computing, HPDC '11, pp.135-146, 2011.
DOI : 10.1145/1996130.1996151

F. Douglis, J. Lavoie, J. M. Tracey, P. Kulkarni, and P. Kulkarni, Redundancy elimination within large collections of files, USENIX Annual Technical Conference, General Track, pp.59-72, 2004.

C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian et al., Hydrastor: a scalable secondary storage, FAST '09: Proccedings of the 7th conference on File and storage technologies, pp.197-210, 2009.

F. Guo and P. Efstathopoulos, Building a high-performance deduplication system, USENIXATC'11: Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference, pp.25-39, 2011.

M. Iritani and H. Yokota, Effects on performance and energy reduction by file relocation based on file-access correlations, Proceedings of the 2012 Joint EDBT/ICDT Workshops on, EDBT-ICDT '12, pp.79-86, 2012.
DOI : 10.1145/2320765.2320794

K. R. Jayaram, C. Peng, Z. Zhang, M. Kim, H. Chen et al., An empirical analysis of similarity in virtual machine images, Proceedings of the Middleware 2011 Industry Track Workshop on, Middleware '11, 2011.
DOI : 10.1145/2090181.2090187

K. R. Jayaram, C. Peng, Z. Zhang, M. Kim, H. Chen et al., An empirical analysis of similarity in virtual machine images, Proceedings of the Middleware 2011 Industry Track Workshop on, Middleware '11, 2011.
DOI : 10.1145/2090181.2090187

K. Jin and E. L. Miller, The effectiveness of deduplication on virtual machine disk images, Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference on, SYSTOR '09, pp.1-7, 2009.
DOI : 10.1145/1534530.1534540

A. Karve and A. Kochut, Redundancy Aware Virtual Disk Mobility for Cloud Computing, 2013 IEEE Sixth International Conference on Cloud Computing, pp.35-42, 2013.
DOI : 10.1109/CLOUD.2013.114

Y. Kim, R. Gunasekaran, G. M. Shipman, D. Dillow, Z. Zhang et al., Workload characterization of a leadership class storage cluster. In PDSW '10: The 5th Petascale Data Storage Workshop, pp.1-5, 2010.

A. Kochut and A. Karve, Leveraging local image redundancy for efficient virtual machine provisioning, 2012 IEEE Network Operations and Management Symposium, 2012.
DOI : 10.1109/NOMS.2012.6211897

R. Koller and R. Raju, I/O Deduplication, FAST '10: Proceedings of the USENIX File and Storage Technologies, pp.211-224, 2010.
DOI : 10.1145/1837915.1837921

Z. Li, Z. Chen, and Y. Zhou, Mining block correlations to improve storage performance, ACM Transactions on Storage, vol.1, issue.2, pp.213-245, 2005.
DOI : 10.1145/1063786.1063790

B. Mao, H. Jiang, S. Wu, Y. Fu, and L. Tian, Readperformance optimization for deduplication-based storage systems in the cloud, Trans. Storage, vol.106, issue.2, pp.1-622, 2014.

M. Marazakis, V. Papaefstathiou, and A. Bilas, Optimization and bottleneck analysis of network block I/O in commodity storage systems, Proceedings of the 21st annual international conference on Supercomputing, ICS '07, pp.33-42, 2007.
DOI : 10.1145/1274971.1274979

G. Memik, M. T. Kandemir, W. Liao, and A. Choudhary, Multicollective I/O, 22] Athicha Muthitacharoen, Benjie Chen, and DavidMazì eres. A lowbandwidth network file system, pp.349-369174, 2001.
DOI : 10.1145/1168910.1168915

B. Nicolae, Towards Scalable Checkpoint Restart: A Collective Inline Memory Contents Deduplication Proposal, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp.19-28, 2013.
DOI : 10.1109/IPDPS.2013.14

URL : https://hal.archives-ouvertes.fr/hal-00781532

B. Nicolae, J. Bresnahan, K. Keahey, and G. Antoniu, Going back and forth, Proceedings of the 20th international symposium on High performance distributed computing, HPDC '11, pp.147-158, 2011.
DOI : 10.1145/1996130.1996152

URL : https://hal.archives-ouvertes.fr/inria-00570682

B. Nicolae and M. Rafique, Leveraging Collaborative Content Exchange for On-Demand VM Multi-deployments in IaaS Clouds
DOI : 10.1007/978-3-642-40047-6_32

URL : https://hal.archives-ouvertes.fr/hal-00835432

C. Peng, M. Kim, Z. Zhang, and H. Lei, VDN: Virtual machine image distribution network for cloud data centers, 2012 Proceedings IEEE INFOCOM, 2012.
DOI : 10.1109/INFCOM.2012.6195556

M. Rabin, Fingerprinting by random polynomials, 1981.

N. Aaron-robison and T. J. Hacker, Comparison of VM deployment methods for HPC education, Proceedings of the 1st Annual conference on Research in information technology, RIIT '12, pp.43-48, 2012.
DOI : 10.1145/2380790.2380801

F. Schmuck and R. Haskin, Gpfs: A shared-disk file system for large computing clusters, FAST '02: Proceedings of the 1st USENIX Conference on File and Storage Technologies, 2002.

Z. Shen, Z. Zhang, A. Kochut, A. Karve, H. Chen et al., VMAR: Optimizing I/O Performance and Resource Utilization in the Cloud, Middleware '13: Proceedings of the 14th ACM/IFIP/USENIX International Middleware Conference, pp.183-203, 2013.
DOI : 10.1007/978-3-642-45065-5_10

URL : https://hal.archives-ouvertes.fr/hal-01480776

T. Shibata, S. Choi, and K. Taura, File-access patterns of data-intensive workflow applications and their implications to distributed filesystems, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pp.746-755, 2010.
DOI : 10.1145/1851476.1851585

X. Wang and M. Cherniack, Improving query I/O performance by permuting and refining block request sequences, Proceedings of the 15th ACM international conference on Information and knowledge management , CIKM '06, pp.652-661, 2006.
DOI : 10.1145/1183614.1183707

R. Wartel, T. Cass, B. Moreira, E. Roche, M. Guijarro et al., Image Distribution Mechanisms in Large Scale Cloud Providers, 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp.112-117, 2010.
DOI : 10.1109/CloudCom.2010.73

P. Xia, D. Feng, H. Jiang, L. Tian, and F. Wang, FARMER, Proceedings of the 17th international symposium on High performance distributed computing, HPDC '08, pp.185-196, 2008.
DOI : 10.1145/1383422.1383445

B. Zhu, K. Li, and H. Patterson, Avoiding the disk bottleneck in the data domain deduplication file system, FAST'08: Proceedings of the 6th USENIX Conference on File and Storage Technologies, pp.1-18, 2008.