E. Deelman and D. Gannon, Workflows and e-Science: An overview of workflow system features and capabilities, Future Generation Computer Systems, vol.25, issue.5, pp.528-540, 2009.
DOI : 10.1016/j.future.2008.06.012

E. Deelman and G. Singh, Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems, Scientific Programming, pp.219-237, 2005.
DOI : 10.1155/2005/128026

K. Wolstencroft and R. Haines, The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud, Nucleic Acids Research, vol.41, issue.W1, pp.557-561, 2013.
DOI : 10.1093/nar/gkt328

J. Liu and E. Pacitti, Scientific workflow scheduling with provenance support in multisite cloud, High Performance Computing for Computational Science VECPAR, 2016.
URL : https://hal.archives-ouvertes.fr/lirmm-01342190

A. Thomson, D. J. Abadi, S. R. Alam, and H. N. El-harake, CalvinFS: consistent wan replication and scalable metadata management for distributed file systems Parallel I/O and the metadata wall, Proc. of the 13th USENIX Conf. on File and Storage Technologies Proc. of the 6th Workshop on Parallel Data Storage, ser. PDSW '11, pp.13-18, 2011.

S. Ghemawat, H. Gobioff, and S. Leung, The Google file system, ACM SIGOPS Operating Systems Review, vol.37, issue.5, pp.29-43, 2003.
DOI : 10.1145/1165389.945450

F. Schmuck and R. Haskin, GPFS: A shared-disk file system for large computing clusters, Proc. of the 1st USENIX Conference on File and Storage Technologies, ser. FAST '02, 2002.

E. Ogasawara and J. Dias, An algebraic approach for data-centric scientific workflows, Proc. of VLDB Endowment, pp.1328-1339, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00640431

G. Juve and A. Chervenak, Characterizing and profiling scientific workflows, Future Generation Computer Systems, vol.29, issue.3, pp.682-692, 2013.
DOI : 10.1016/j.future.2012.08.015

E. Deelman and S. Callaghan, Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06), 2006.
DOI : 10.1109/E-SCIENCE.2006.261098

L. Pineda-morales, A. Costan, and G. Antoniu, Towards Multi-site Metadata Management for Geographically Distributed Cloud Workflows, 2015 IEEE International Conference on Cluster Computing, pp.294-303, 2015.
DOI : 10.1109/CLUSTER.2015.49

URL : https://hal.archives-ouvertes.fr/hal-01239150

J. J. Levandoski, P. Larson, and R. Stoica, Identifying hot and cold data in main-memory databases, 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp.26-37, 2013.
DOI : 10.1109/ICDE.2013.6544811

D. Gibson, Is Your Big Data Hot, Warm, or Cold? [Online] Available: http://www.ibmbigdatahub.com/blog/ your-big-data-hot-warm-or-cold, 2012.

S. Bharathi and A. Chervenak, Characterization of scientific workflows Azure Speed Test, Workshop on WFs in Support of Large-Scale Science, 2008.

E. Deelman and G. Singh, The cost of doing science on the cloud: The Montage example, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.501-5012, 2008.
DOI : 10.1109/SC.2008.5217932

J. Dias and E. Ogasawara, Algebraic dataflows for big data analysis, 2013 IEEE International Conference on Big Data, pp.150-155, 2013.
DOI : 10.1109/BigData.2013.6691567

M. Stonebraker and S. Madden, The end of an architectural era: Time for a complete rewrite, Proc. of the 33rd Intl. Conf. on Very Large Data Bases, ser. VLDB '07, pp.1150-1160

M. Stonebraker and U. Cetintemel, "One Size Fits All": An Idea Whose Time Has Come and Gone, 21st International Conference on Data Engineering (ICDE'05), pp.2-11, 2005.
DOI : 10.1109/ICDE.2005.1

J. Wang and S. Wu, Indexing multi-dimensional data in a cloud system, Proceedings of the 2010 international conference on Management of data, SIGMOD '10, pp.591-602, 2010.
DOI : 10.1145/1807167.1807232

S. Wu and D. Jiang, Efficient B-tree based indexing for cloud data processing, Proc. VLDB Endow, pp.1207-1218, 2010.
DOI : 10.14778/1920841.1920991

A. W. Leung and M. Shao, Spyglass: Fast, scalable metadata search for large-scale storage systems, FAST, pp.153-166, 2009.

A. Gehani, M. Kim, and T. Malik, Efficient querying of distributed provenance stores, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pp.613-621, 2010.
DOI : 10.1145/1851476.1851567

T. Malik, L. Nistor, and A. Gehani, Tracking and Sketching Distributed Data Provenance, 2010 IEEE Sixth International Conference on e-Science, pp.190-197, 2010.
DOI : 10.1109/eScience.2010.51

P. F. Corbett and D. G. Feitelson, The Vesta parallel file system, ACM Transactions on Computer Systems, vol.14, issue.3, pp.225-264, 1996.
DOI : 10.1145/233557.233558

E. L. Miller and R. H. Katz, RAMA: An easy-to-use, high-performance parallel file system, Parallel Computing, vol.23, issue.4-5, pp.419-446
DOI : 10.1016/S0167-8191(97)00008-2

S. A. Brandt and E. L. Miller, Efficient metadata management in large distributed storage systems, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings., 2003.
DOI : 10.1109/MASS.2003.1194865

D. Zhao and Z. Zhang, FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems, 2014 IEEE International Conference on Big Data (Big Data), 2014.
DOI : 10.1109/BigData.2014.7004214

R. Souza and V. Silva, Parallel execution of workflows driven by a distributed database management system, ACM/IEEE Conference on Supercomputing, 2015.

D. Zhao and C. Shou, Distributed data provenance for large-scale data-intensive computing, 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp.1-8
DOI : 10.1109/CLUSTER.2013.6702685

M. J. Zaki, Spade: An efficient algorithm for mining frequent sequences, Machine Learning, pp.31-60