G. Fox, T. Hey, and A. Trefethen, Where does all the data come from?, 2011.

L. Barroso, J. Dean, and U. Holzle, Web search for a planet: the google cluster architecture, IEEE Micro, vol.23, issue.2, pp.22-28, 2003.
DOI : 10.1109/MM.2003.1196112

J. Dean and S. Ghemawat, MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.
DOI : 10.1145/1327452.1327492

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, Dryad: Distributed Data-parallel Programs from Sequential Building Blocks, EuroSys '07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conferenceon Computer Systems, 2007.

D. Peng and F. Dabek, Large-scale incremental processing using distributed transactions and notifications, Proceedings of the 9th USENIX conference on Operating systems design and implementation, OSDI'10, USENIX Association, pp.1-15, 2010.

R. Meier and V. Cahill, Taxonomy of Distributed Event-Based Programming Systems, The Computer Journal, vol.48, issue.5, pp.602-626, 2005.
DOI : 10.1093/comjnl/bxh120

T. Von-eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser, Active messages, ACM SIGARCH Computer Architecture News, vol.20, issue.2, pp.256-266, 1992.
DOI : 10.1145/146628.140382

R. Bolze, all, Grid5000: A large scale highly reconfigurable experimental grid testbed, International Journal on High Peerformance Computing and Applications

P. T. Eugster, P. A. Felber, R. Guerraoui, and A. Kermarrec, The many faces of publish/subscribe, ACM Computing Surveys, vol.35, issue.2, pp.114-131, 2003.
DOI : 10.1145/857076.857078

T. Murata, Petri nets: Properties, analysis and applications, Proceedings of the IEEE, pp.541-580, 1989.
DOI : 10.1109/5.24143

E. Deelman, D. Gannon, M. Shields, and I. Taylor, Workflows and e-Science: An overview of workflow system features and capabilities, Future Generation Computer Systems, vol.25, issue.5, pp.528-540, 2009.
DOI : 10.1016/j.future.2008.06.012

K. Shvachko, K. Hairong, S. Radia, and R. Chansler, The Hadoop Distributed File System, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp.1-10, 2010.
DOI : 10.1109/MSST.2010.5496972

G. Fedak, H. He, and F. Cappello, BitDew: A programmable environment for large-scale data management and distribution, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-12, 2008.
DOI : 10.1109/SC.2008.5213939

URL : https://hal.archives-ouvertes.fr/inria-00216126

R. Love, Kernel korner: intro to inotify, Linux Journal, vol.8, issue.139, 2005.

I. Foster and . Globus-online, Accelerating and democratizing science through cloud-based services, Internet Computing, IEEE, vol.15, issue.3, pp.70-73, 2011.
DOI : 10.1109/mic.2011.64

B. Tang, M. Moca, S. Chevalier, H. He, and G. Fedak, Towards MapReduce for Desktop Grid Computing, 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pp.193-200, 2010.
DOI : 10.1109/3PGCIC.2010.33

URL : https://hal.archives-ouvertes.fr/hal-00687553

K. Muniswamy-reddy, D. A. Holland, U. Braun, and M. Seltzer, Provenance-aware storage systems, Proceedings of the 2006 USENIX Annual Technical Conference, pp.43-56, 2006.

J. Bulosan, D. Thain, and P. Flynn, All-pairs: An abstraction for dataintensive cloud computing, International Symposium on Parallel and Distributed Processing, 2008.

J. Dean and S. Ghemawatta, Pig latin: a not-so-foreign language for data processing, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008.

J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S. Bae et al., Twister, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pp.810-818, 2010.
DOI : 10.1145/1851476.1851593

O. K. Bjorn-lohrmann and D. Warneke, Massively-parallel stream processing under QoS constraints with Nephele, Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC '12, 2012.
DOI : 10.1145/2287076.2287117

I. Foster, J. Vöckler, M. Wilde, and Y. Zhao, Chimera: Avirtual data system for representing, querying, and automating data derivation, Scientific and Statistical Database Management, 2002.

F. Costa, L. Silva, G. Fedak, and I. Kelley, Optimizing the data distribution layer of BOINC with BitTorrent, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-8, 2008.
DOI : 10.1109/IPDPS.2008.4536446

URL : https://hal.archives-ouvertes.fr/hal-00684422

B. Wei, G. Fedak, and F. Cappello, Towards efficient data distribution on computational desktop grids with BitTorrent, Future Generation Computer Systems, vol.23, issue.8, pp.983-989, 2007.
DOI : 10.1016/j.future.2007.04.006

F. Costa, L. Silva, G. Fedak, and I. Kelley, OPTIMIZING DATA DISTRIBUTION IN DESKTOP GRID PLATFORMS, Parallel Processing Letters, vol.18, issue.03, pp.391-410, 2008.
DOI : 10.1142/S0129626408003466

B. Tang, H. He, and G. Fedak, HybridMR: a new approach for hybrid MapReduce combining desktop grid and cloud infrastructures, Concurrency and Computation: Practice and Experience, vol.20, issue.4
DOI : 10.1002/cpe.3515

URL : https://hal.archives-ouvertes.fr/hal-01239299

A. Simonet, G. Fedak, M. Ripeanu, and S. , Active data, Proceedings of the 8th Parallel Data Storage Workshop on, PDSW '13, pp.39-44, 2013.
DOI : 10.1145/2538542.2538566

URL : https://hal.archives-ouvertes.fr/hal-00921080

A. Simonet, K. Chard, G. Fedak, and I. Foster, Using Active Data to Provide Smart Data Surveillance to E-Science Users, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, 2015.
DOI : 10.1109/PDP.2015.76

URL : https://hal.archives-ouvertes.fr/hal-01256207

Y. Demchenko, P. Grosso, C. De-laat, and P. Membrey, Addressing big data issues in Scientific Data Infrastructure, 2013 International Conference on Collaboration Technologies and Systems (CTS), pp.48-55, 2013.
DOI : 10.1109/CTS.2013.6567203

T. Ho and D. Abramson, Active Data: Supporting the Grid Data Life Cycle, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07), pp.39-46, 2007.
DOI : 10.1109/CCGRID.2007.16

L. Ramakrishnan, D. Ghoshal, V. Hendrix, E. Feller, P. Mantha et al., Storage and Data Life Cycle Management in Cloud Environments with FRIEDA, Cloud Computing for Data Intensive Applications, 2015.
DOI : 10.1007/978-1-4939-1905-5_15

K. Wolstencroft, R. Haines, D. Fellows, A. Williams, D. Withers et al., The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud, Nucleic Acids Research, vol.41, issue.W1, pp.557-561, 2013.
DOI : 10.1093/nar/gkt328

J. Goecks, A. Nekrutenko, and J. Taylor, Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences, Genome Biology, vol.11, issue.8, p.86, 2010.
DOI : 10.1186/gb-2010-11-8-r86

D. Rex, J. Ma, and A. Toga, The LONI Pipeline Processing Environment, NeuroImage, vol.19, issue.3, pp.1033-1048, 2003.
DOI : 10.1016/S1053-8119(03)00185-X

M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz et al., Swift: A language for distributed parallel scripting parallel computing

E. Deelman, G. Singh, M. Su, J. Blythe, Y. Gil et al., Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems, Scientific Programming, vol.13, issue.3, pp.219-237, 2005.
DOI : 10.1155/2005/128026

K. Taura, K. Kaneda, T. Endo, and A. Yonezawa, Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources, pp.216-229, 2003.

C. E. Killian, J. W. Anderson, R. Braud, R. Jhala, and A. M. Vahdat, Mace: Language support for building distributed systems, Proceedings of the 2007 ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI'07, 2007.
DOI : 10.1109/p2p.2009.5284502

D. Mazieres, A toolkit for user-level file systems, Proceedings of the 2001 USENIX Annual Technical Conference, 2001.

S. Yoo, H. Lee, C. Killian, and M. Kulkarni, InContext, Proceedings of the 20th international symposium on High performance distributed computing, HPDC '11, pp.97-108, 2011.
DOI : 10.1145/1996130.1996144

J. Cheney, S. Chong, N. Foster, M. Seltzer, and S. Vansummeren, Provenance, Proceeding of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications, OOPSLA '09, 2009.
DOI : 10.1145/1639950.1640064

Y. L. Simmhan, B. Plale, and D. Gannon, A survey of data provenance in e-science, SIGMOD Rec

L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil et al., The Open Provenance Model core specification (v1.1), Future Generation Computer Systems, vol.27, issue.6, pp.743-756, 2011.
DOI : 10.1016/j.future.2010.07.005

K. Muniswamy-reddy, P. Macko, and M. Seltzer, Provenance for the cloud, Proceedings of the 8th Conference on File and Storage Technologies (FAST'10), 2010.

D. Thain, C. Moretti, and J. Hemmes, Chirp: a practical global filesystem for cluster and Grid computing, Journal of Grid Computing, vol.14, issue.1, pp.51-72, 2009.
DOI : 10.1007/s10723-008-9100-5

E. Vairavanathan, S. Al-kiswany, L. Costa, Z. Zhang, D. Katz et al., A Workflow-Aware Storage System: An Opportunity Study, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), 2012.
DOI : 10.1109/CCGrid.2012.109

H. He, G. Fedak, P. Kacsuk, Z. Farkas, Z. Balaton et al., Extending the EGEE Grid with XtremWeb-HEP Desktop Grids, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp.685-690, 2010.
DOI : 10.1109/CCGRID.2010.100

URL : https://hal.archives-ouvertes.fr/hal-00687541

S. Ratnasamy, M. Handley, R. Karp, and S. Shenker, Topologically-aware overlay construction and server selection, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies, pp.1190-1199, 2002.
DOI : 10.1109/INFCOM.2002.1019369

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.1322