A. Agarwal, R. Bianchini, D. Chaiken, K. L. Johnson, D. Kranz et al., The mit alewife machine: architecture and performance

R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield et al., The tera computer system, Proceedings of the 4th international conference on Supercomputing, pp.1-6, 1990.

S. Coleman and K. S. Mckinley, Tile size selection using cache organization and data layout, Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation, pp.279-290, 1995.
DOI : 10.1145/207110.207162

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.128.9167

R. Cooksey, S. Jourdan, and D. Grunwald, A stateless, content-directed data prefetching mechanism, Tenth international conference on architectural support for programming languages and operating systems on Proceedings of the 10th international conference on architectural support for programming languages and operating systems (ASPLOS-X), pp.279-290, 2002.
DOI : 10.1145/635506.605427

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138.5952

O. Etzioni and D. S. Weld, Intelligent agents on the Internet: Fact, fiction, and forecast, IEEE Expert: Intelligent Systems and Their Applications, pp.44-49, 1995.
DOI : 10.1109/64.403956

S. Ghosh, M. Martonosi, and S. Malik, Cache miss equations, Proceedings of the 11th international conference on Supercomputing , ICS '97, pp.317-324, 1997.
DOI : 10.1145/263580.263657

J. Giavitto and O. Michel, MGS, Electronic Notes in Theoretical Computer Science, vol.59, issue.4, 2001.
DOI : 10.1016/S1571-0661(04)00293-2

URL : https://hal.archives-ouvertes.fr/hal-00769284

F. Gruau and P. Malbos, The Blob: A Basic Topological Concept for ???Hardware-Free??? Distributed Computation, Unconventional Models of Computation (UMC'02), pp.151-163, 2002.
DOI : 10.1007/3-540-45833-6_13

URL : https://hal.archives-ouvertes.fr/lirmm-00268600

M. D. Lam, E. E. Rothberg, and M. E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, pp.63-74, 1991.

J. L. Lo, J. S. Emer, H. M. Levy, R. L. Stamm, D. M. Tullsen et al., Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading, ACM Transactions on Computer Systems, vol.15, issue.3, pp.322-354, 1997.
DOI : 10.1145/263326.263382

R. Nagarajan, K. Sankaralingam, D. Burger, and S. W. Keckler, A design space evaluation of grid processor architectures, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34, pp.40-51, 2001.
DOI : 10.1109/MICRO.2001.991104

K. Olukotun, A. Basem, L. Nayfeh, K. Hammond, K. Wilson et al., The case for a single-chip multiprocessor, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, pp.2-11, 1996.

D. Parello, O. Temam, and J. Verdun, On Increasing Architecture Awareness in Program Optimizations to Bridge the Gap between Peak and Sustained Processor Performance - Matrix-Multiply Revisited, ACM/IEEE SC 2002 Conference (SC'02), p.31, 2002.
DOI : 10.1109/SC.2002.10054

A. Roth and G. Sohi, Speculative data-driven multithreading, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture, pp.37-48, 2001.
DOI : 10.1109/HPCA.2001.903250

URL : https://minds.wisconsin.edu/bitstream/handle/1793/60236/TR1414.pdf?sequence=1

A. Roth and G. S. Sohi, Effective jump-pointer prefetching for linked data structures, Proceedings of the 26th annual international symposium on Computer architecture, pp.111-121, 1999.
DOI : 10.1145/307338.300989

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.391.5339

K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh et al., Exploiting ilp, tlp, and dlp with the polymorphous TRIPS architecture, Proceedings of the 30th annual international symposium on Computer architecture, pp.422-433, 2003.
DOI : 10.1145/871656.859667

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.5532

Y. Solihin, J. Lee, and J. Torrellas, Using a user-level memory thread for correlation prefetching, Proceedings of the 29th annual international symposium on Computer architecture, pp.171-182, 2002.
DOI : 10.1145/545214.545235

URL : http://chooyu.cs.uiuc.edu/iacoma-papers/isca02pref.ps

A. A. Stepanov and M. Lee, The Standard Template Library, 1994.

M. Taylor, W. Lee, J. Miller, D. Wentzlaff, B. Greenwald et al., Evaluation of the Raw Microprocessor, Proceedings of the 31st annual international symposium on Computer architecture, 2004.
DOI : 10.1145/1028176.1006733

O. Temam, C. Fricker, and W. Jalby, Cache interference phenomena, Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pp.261-271, 1994.

M. Dean, S. J. Tullsen, J. S. Eggers, H. M. Emer, J. L. Levy et al., Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, pp.191-202, 1996.

M. Dean, S. J. Tullsen, H. M. Eggers, and . Levy, Simultaneous multithreading: maximizing on-chip parallelism, Proceedings of the 22nd annual international symposium on Computer architecture, pp.392-403, 1995.

M. Dean, J. L. Tullsen, S. J. Lo, H. M. Eggers, and . Levy, Supporting fine-grained synchronization on a simultaneous multithreading processor, Proceedings of the The Fifth International Symposium on High Performance Computer Architecture, p.54, 1999.

S. Umatani, M. Yasugi, T. Komiya, and T. Yuasa, Pursuing Laziness for Efficient Implementation of Modern Multithreaded Languages, Fourth International Symposium on High Performance Computing, pp.174-188, 2003.
DOI : 10.1007/978-3-540-39707-6_13

L. Rauchwerger, Y. Zhan, and J. Torrellas, Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors, Proceedings of the The Fourth International Symposium on High-Performance Computer Architecture, p.162, 1998.

Z. A. Ye, A. Moshovos, S. Hauck, and P. Banerjee, Chimaera: a high-performance architecture with a tightly-coupled reconfigurable functional unit, Proceedings of the 27th annual international symposium on Computer architecture, pp.225-235, 2000.

S. Yehia and O. Temam, From Sequences of Dependent Instructions to Functions, Proceedings of the 31st annual international symposium on Computer architecture, 2004.
DOI : 10.1145/1028176.1006721

C. Zilles and G. Sohi, Execution-based prediction using speculative slices, Proceedings of the 28th annual international symposium on Computer architecture, pp.2-13, 2001.
DOI : 10.1145/384285.379246

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.6428