A. , A. Arasu, /. S. Babu, and /. J. Widom, The CQL continuous query language: semantic foundations and query execution " . The International Journal on Very Large Data Bases. 152 -2006, Locality on Manycore Architectures Annexes Other references, pp.121-142

/. [. Appel and . George, Optimal Spilling for CISC Machines with Few Registers, Proceedings of the 22nd ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.243-253, 2001.

/. [. Briggs, /. L. Cooper, and . Torczon, Improvements to graph coloring register allocation, ACM Transactions on Programming Languages and Systems, vol.16, issue.3, pp.163-1994428
DOI : 10.1145/177492.177575

/. [. Beldiceanu and . Demassey, Global Constraint Catalog, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00485396

/. [. Bennett, LRU stack processing " . IBM Journal of Research and Development. 194 -1975. IBM Corp, pp.353-357

]. G. Bil+95, /. M. Bilsen, /. R. Engels, /. J. Lauwereins, and . Peperstraete, Cycle-static Dataflow, International Conference on Acoustics, Speech, and Signal Processing, pp.3255-3258, 1995.

]. P. Bou+99, /. J. Boulet, /. F. Dongarra, /. Y. Rastello, and . Robert, Algorithmic Issues on Heterogeneous Computing Platforms " . Parallel Processing Letters . 92 -1999, pp.197-213

]. A. Bri+08, /. C. Brito, /. H. Fetzer, /. P. Sturzrehm, and . Felber, Speculative out-of-order event processing with software transaction memory, Proceedings of the 2nd International Conference on Distributed Event-based Systems. ACM, pp.265-275, 2008.

]. S. Bro+00, /. J. Browne, /. N. Dongarra, /. K. Garner, and . London, A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters, Proceedings of SuperComputing, 2000.

]. I. Buc+04, /. T. Buck, /. D. Foley, /. J. Horn, and . Sugerman, Brook for GPUs: stream computing on graphics hardware, ACM Transactions on Graphics. ACM, pp.777-786, 2004.

/. [. Coffman and . Bruno, Computer and job-shop scheduling theory

/. [. Colombet, /. A. Brandner, and . Darte, Studying Optimal Spilling in the Light of SSA Architectures and Synthesis for Embedded Systems, Proceedings of the 14th International Conference on Compilers, pp.25-34, 2011.

/. [. Callahan, /. K. Carr, and . Kennedy, Improving Register Allocation for Subscripted Variables, Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, pp.53-65, 1990.

/. [. Chow and . Hennessy, The priority-based coloring approach to register allocation, ACM Transactions on Programming Languages and Systems, vol.12, issue.4, pp.124-1990501
DOI : 10.1145/88616.88621

/. [. Carpenter, /. E. Ramirez, and . Ayguade, Mapping stream programs onto heterogeneous multiprocessor systems, Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, CASES '09, 2009.
DOI : 10.1145/1629395.1629406

/. [. Cheng and . Wei, An improved two-way partitioning algorithm with stable performance (VLSI), IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 1012 -1991, pp.1502-1511
DOI : 10.1109/43.103500

]. D. Car+03, /. U. Carney, /. A. Çetintemel, /. S. Rasin, and . Zdonik, Operator scheduling in a data stream manager, Proceedings of the 29th International Conference on Very Large Data Bases. VLDB Endowment, pp.838-849, 2003.

G. J. Chaitin, /. M. Auslander, /. A. Chandra, and /. J. Cocke, Register Allocation via Coloring " . Computer Languages. 61 -1981, pp.47-57

]. G. Cha82 and . Chaitin, Register Allocation & Spilling via Graph Coloring, Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction, pp.98-105, 1982.

]. E. Coh+02, /. E. Cohen, /. H. Halperin, /. U. Kaplan, and . Zwick, Reachability and Distance Queries via 2-hop Labels, Proceedings of the 13th Annual ACM- SIAM Symposium on Discrete Algorithms, pp.937-946, 2002.

]. C. Con+03, /. H. Consel, /. L. Hamdi, /. L. Réveillère, and . Singaravelu, Spidle: a DSL approach to specifying streaming applications " . Generative Programming and Component Engineering, pp.1-17, 2003.

R. Cruz, A calculus for network delay. I. Network elements in isolation, IEEE Transactions on Information Theory, vol.37, issue.1, pp.371-1991114
DOI : 10.1109/18.61109

]. R. Cyt+91, /. J. Cytron, /. B. Ferrante, /. N. Rosen, and . Wegman, Efficiently Computing Static Single Assignment Form and the Control Dependence Graph, ACM Transactions on Programming Languages and Systems, vol.134, pp.451-490, 1991.

]. P. Den68 and . Denning, The working set model for program behavior, Communications of the ACM, vol.115, pp.323-333, 1968.

]. J. Den74 and . Dennis, First version of a data flow procedure language " . Programming Symposium, pp.362-376, 1974.

]. B. Din+00, /. F. Dupont-de-dinechin, /. C. De-ferrière, /. A. Guillon, and . Stoutchinin, Code Generator Optimizations for the ST120 DSP-MCU Core, Proceedings of the 2000 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp.93-102, 2000.

]. P. Dlu+14, /. D. Dlugosch, /. P. Brown, /. M. Glendenning, and . Leventhal, An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing, Annexes Other references IEEE Transactions on Parallel and Distributed Systems. 2512 -2014, pp.3088-3098

]. V. Ela+14, /. F. Elango, /. Rastello, /. J. Pouchet, and . Ramanujam, On Characterizing the Data Movement Complexity of Computational DAGs for Parallel Execution, Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, pp.296-306, 2014.

]. V. Ela+15a, /. F. Elango, /. Rastello, /. J. Pouchet, and . Ramanujam, On Characterizing the Data Access Complexity of Programs, Proceedings of the 42th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

]. V. Ela+15b, /. N. Elango, /. F. Sedaghati, /. Rastello, and . Pouchet, On Using the Roofline Model with Lower Bounds on Data Movement, ACM Transactions on Architure and Code Optimization ACM. p, vol.67, pp.114-20151

]. S. Far+11, /. Y. Farhad, /. B. Ko, /. B. Burgstaller, and . Scholz, Orchestration by Approximation: Mapping Stream Programs Onto Multicore Architectures, Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.357-368, 2011.

]. N. Fau+13, /. V. Fauzia, /. M. Elango, /. J. Ravishankar, and . Ramanujam, Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential " . ACM Transactions on Architure and Code Optimization . 104 -2013, ACM. p, vol.53, pp.1-5329

/. [. George and . Appel, Iterated Register Coalescing, ACM Transactions on Programming Languages and Systems, pp.183-1996300

/. [. Goodman and . Hsu, Code scheduling and register allocation in large basic blocks, Proceedings of the 2nd International Conference on Supercomputing, pp.442-452, 1988.

/. [. Glass and . Ni, The Turn Model for Adaptive Routing, Journal of the ACM, vol.133, pp.415-1994874

/. [. Hendrickson and . Leland, A multilevel algorithm for partitioning graphs, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '95, p.92
DOI : 10.1145/224170.224228

/. [. Heath and . Raghavan, A Cartesian Parallel Nested Dissection Algorithm, SIAM Journal on Matrix Analysis and Applications, vol.16, issue.1, pp.161-1995
DOI : 10.1137/S0895479892238270

]. N. Hal+91, /. P. Halbwachs, /. P. Caspi, /. D. Raymond, and . Pilaud, The synchronous data flow programming language LUSTRE, Proceedings of the IEEE, pp.1305-1320, 1991.

]. M. Hir+14, /. R. Hirzel, /. S. Soulé, /. B. Schneider, and . Gedik, A Catalog of Stream Processing Optimizations, pp.464-2014

/. [. Irigoin and . Triolet-, Supernode partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '88, pp.319-329, 1988.
DOI : 10.1145/73560.73588

/. [. Jia-wei and . Kung, I/O complexity, Proceedings of the thirteenth annual ACM symposium on Theory of computing , STOC '81, pp.326-333, 1981.
DOI : 10.1145/800076.802486

]. F. Jaf+08, /. M. Jafari, /. M. Yaghmaee, /. A. Talebi, and . Khonsari, Max-Min-Fair Best Effort Flow Control in Network-on-Chip Architectures, Annexes Other references Proceedings of the 8th International Conference on Computational Science, Part I, pp.436-445, 2008.

]. F. Jaf+10, /. Z. Jafari, /. A. Lu, /. M. Jantsch, and . Yaghmaee, Optimal Regulation of Traffic Flows in Networks-on-chip, Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, pp.1621-1624, 2010.

]. D. Jag+12, /. K. Jagtap, /. D. Bahulkar, /. N. Ponomarev, and . Abu-ghazaleh, Characterizing and Understanding PDES Behavior on Tilera Architecture, Proceedings of the 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation, pp.2012-53

]. R. Jin+09, /. Y. Jin, /. N. Xiang, /. D. Ruan, and . Fuhry-, 3HOP: A High-compression Indexing Scheme for Reachability Query, Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp.813-826, 2009.

]. R. Jin+11, /. N. Jin, /. Y. Ruan, /. H. Xiang, and . Wang, Path-tree: An Efficient Reachability Indexing Scheme for Large Directed Graphs, ACM Transactions on Database Systems. ACM. p, vol.7, pp.361-20111

]. R. Jin+12, /. N. Jin, /. S. Ruan, /. J. Dey, and . Xu, SCARAB: Scaling Reachability Computation on Large Graphs, Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM. 2012. p, pp.169-180

/. [. Karypis and . Kumar, Multi-level graph partitioning schemes, Proceedings of the 1995 International Conference for Parallel Processing, pp.113-122, 1995.

/. [. Kernighan and . Lin, An Efficient Heuristic Procedure for Partitioning Graphs, Bell System Technical Journal, vol.49, issue.2, pp.291-307
DOI : 10.1002/j.1538-7305.1970.tb01770.x

/. [. Kudlur and . Mahlke-, Orchestrating the Execution of Stream Programs on Multicore Platforms, Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.114-124, 2008.

K. , F. Kelly, /. A. Maulloo, and /. D. Tan, Rate control for communication networks: shadow prices, proportional fairness and stability " . Journal of the Operational Research Society. 493 -1998, Data Locality on Manycore Architectures Annexes [, pp.237-252

/. [. Lattner and . Adve-, LLVM: A compilation framework for lifelong program analysis & transformation, International Symposium on Code Generation and Optimization, 2004. CGO 2004.
DOI : 10.1109/CGO.2004.1281665

[. Boudec and /. P. Thiran, Network calculus: a theory of deterministic queuing systems for the internet, 2001.

/. [. Lu and . Cooper-, Register Promotion in C Programs, Proceedings of the 18th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.308-319, 1997.

/. [. Lee and . Messerschmitt, Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing, IEEE Transactions on Computers, vol.36, issue.1
DOI : 10.1109/TC.1987.5009446

]. R. Lo+98, /. F. Lo, /. R. Chow, /. Kennedy, and . Liu, Register Promotion by Sparse Partial Redundancy Elimination of Loads and Stores, Proceedings of the 19th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.26-37, 1998.

/. [. Mckellar and . Coffman, Organizing matrices and matrix operations for paged memory systems, Communications of the ACM, vol.12, issue.3, pp.123-1969153
DOI : 10.1145/362875.362879

/. [. Morel and . Renvoise, Global optimization by suppression of partial redundancies, Communications of the ACM, vol.22, issue.2, pp.222-197996
DOI : 10.1145/359060.359069

/. [. Miller and /. S. Teng, Vavasis - " A unified geometric approach to graph separators, Proceedings of the 32nd Annual Symposium on the Foundations of Computer Science, pp.538-547, 1991.

]. N. Man+93, /. R. Mansour, /. A. Ponnusamy, /. G. Choudhary, and . Fox, Graph Contraction for Physical Optimization Methods: A Quality-cost Tradeoff for Mapping Data on Parallel Computers, Annexes Other references Proceedings of the 7th International Conference on Supercomputing, pp.1-10, 1993.

]. R. Mar+09, /. U. Marculescu, /. Ogras, /. N. Peh, and . Jerger, Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives " . Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp.281-20093

]. R. Mat+70, /. J. Mattson, /. D. Gecsei, /. I. Slutz, and . Traiger, Evaluation techniques for storage hierarchies " . IBM Systems journal. 92 -1970. IBM Corp, pp.78-117

]. S. Mck04 and . Mckee, Reflections on the Memory Wall, Proceedings of the 1st Conference on Computing Frontiers, p.162, 2004.

]. R. Mot+95, /. K. Motwani, /. V. Palem, /. S. Sarkar, and . Reyen, Combining register allocation and instruction scheduling

/. [. Ogras and . Marculescu, Prediction-based flow control for network-on-chip traffic, Proceedings of the 43rd annual conference on Design automation , DAC '06, pp.839-844, 2006.
DOI : 10.1145/1146909.1147123

/. [. Pothen, /. Simon, and . Liou, Partitioning Sparse Matrices with Eigenvectors of Graphs, SIAM Journal on Matrix Analysis and Applications . 113 -1990. SIAM. p, pp.430-452
DOI : 10.1137/0611030

]. Pei86 and . Peir, Program partitioning and synchronization on multiprocessor systems, 1986.

]. A. Pie12 and . Pietrek, TIREX : A textual target-level intermediate representation for virtual execution environment, compiler information exchange and program analysis, p.72

]. A. Pop13 and . Pop, OpenStream, Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems, M-SCOPES '13, pp.2-2
DOI : 10.1145/2463596.2486782

R. , F. Rossi, /. V. Beek, and /. T. Walsh, Handbook of Constraint Programming (Foundations of Artificial Intelligence, Data Locality on Manycore Architectures Annexes [, 2006.

/. [. Renganarayanan, /. S. Ramakrishna, and . Rajopadhye, Combined ILP and Register Tiling: Analytical Model and Optimization Framework, 18th International Workshop on Languages and Compilers for Parallel Computing, pp.244-258, 2005.

]. M. Rad+15, /. D. Radulovic, /. D. Zivanovic, /. R. Ruiz, and . De-supinski, Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC, Proceedings of the 2015 International Symposium on Memory Systems. ACM. 2015. p, pp.31-36

]. R. Ran+04, /. N. Rangan, /. M. Vachharajani, and /. D. Vachharajani, Decoupled Software Pipelining with the Synchronization Array, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp.177-188, 2004.

J. Régin, A Filtering Algorithm for Constraints of Difference in CSPs American Association for Artificial Intelligence, Proceedings of the 1994 National Conference on Artificial Intelligence, pp.362-367, 1994.

/. [. Sastry and . Ju, A New Algorithm for Scalar Register Promotion Based on SSA Form, Proceedings of the 19th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.15-25, 1998.

/. [. Szymanek and . Kuchcinski, JaCoP -Java Constraint Programming solver, 2013.

/. [. Stanley-marbell, /. R. Caparros-cabezas, and . Luijten-, Pinned to the walls — Impact of packaging and application properties on the memory and power walls, IEEE/ACM International Symposium on Low Power Electronics and Design, pp.51-56, 2011.
DOI : 10.1109/ISLPED.2011.5993603

S. , A. Saulsbury, /. F. Pong, and /. A. Nowatzyk, Missing the Memory Wall: The Case for Processor/Memory Integration, Annexes Other references Proceedings of the 23rd Annual International Symposium on Computer Architecture, pp.90-101, 1996.

]. C. Sei+88, /. W. Seitz, /. C. Athas, /. A. Flaig, and . Martin, The Architecture and Programming of the Ametek Series 2010 Multicomputer, Proceedings of the 3rd Conference on Hypercube Concurrent Computers and Applications: Architecture, Software, Computer Systems, and General Issues, pp.33-37, 1988.

]. J. Ser+05, /. W. Sermulins, /. R. Thies, /. S. Rabbah, and . Amarasinghe-, Cache Aware Optimization of Stream Programs, Proceedings of the 2005 ACM SIG- PLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM. 2005. p, pp.115-126

/. [. Thies, /. S. Karczmarek, and . Amarasinghe, StreamIt: A Language for Streaming Applications, Proceedings of the 11th International Conference on Compiler Construction, pp.179-196, 2002.
DOI : 10.1007/3-540-45937-5_14

]. R. Vel+14, /. L. Veloso, /. Cerf, /. M. Jr, and . Zaki, Reachability Queries in Very Large Graphs: A Fast Refined Online Search Approach, Proceedings of the 17th Conference on Extending Database Technology. Open- Proceedings.org. 2014, pp.511-522

/. [. Welsh, /. E. Culler, and . Brewer, SEDA: an architecture for wellconditioned , scalable internet services, ACM SIGOPS Operating Systems Review. ACM. p, vol.139, pp.355-2001230

/. [. Wulf and . Mckee, Hitting the memory wall, ACM SIGARCH Computer Architecture News, vol.23, issue.1, pp.20-24
DOI : 10.1145/216585.216588

/. [. Zhou and . Xue, Exploiting mixed SIMD parallelism by reducing data reorganization overhead, Proceedings of the 2016 International Symposium on Code Generation and Optimization, CGO 2016, pp.59-69
DOI : 10.1145/2854038.2854054

/. [. Çatalyürek and . Aykanat, A hypergraph model for mapping repeated sparse matrix-vector product computations on multicomputers, Proceedings of the International Conference on High Performance Computing, 1995.

/. [. Çatalyürek and . Aykanat, Decomposing irregularly sparse matrices for parallel matrix-vector multiplication " . Parallel Algorithms for Irregularly Structured Problems, pp.75-86, 1996.