C. D. Al-sukhni, H. F. Holt, and J. C. , Improved stride prefetching using extrinsic stream characteristics. In Performance Analysis of Systems and Software, IEEE International Symposium on Volume, pp.166-176, 2006.
DOI : 10.1109/ispass.2006.1620801

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.5993

J. C. Beyler and P. Clauss, Performance driven data cache prefetching in a dynamic software optimization system Reducing memory latency via non-blocking and prefetching caches, ICS '07: Proceedings of the 21st annual international conference on Supercomputing Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, pp.202-209, 1992.

C. Ding, S. Carr, and P. H. Sweany, Modulo scheduling with cache reuse information, Euro-Par '97: Proceedings of the Third International Euro-Par Conference on Parallel Processing, pp.1079-1083, 1997.
DOI : 10.1007/BFb0002856

K. I. Farkas and N. P. Jouppi, Complexity/performance tradeoffs with non-blocking loads, ACM SIGARCH Computer Architecture News, vol.22, issue.2, pp.211-222, 1994.
DOI : 10.1145/192007.192029

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.54

J. Lu, H. Chen, R. Fu, W. Hsu, B. Othmer et al., The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System, MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.180, 2003.

L. John, D. A. Hennessy, and . Patterson, Computer Architecture: A Quantitative Approach, 1996.

K. Koray¨oner and M. Dubois, Effects of memory latencies on non-blocking processor/cache architectures, ICS '93: Proceedings of the 7th international conference on Supercomputing, pp.338-347, 1993.

D. Kroft, Lockup-free instruction fetch/prefetch cache organization, 25 years of the international symposia on Computer architecture (selected papers) , ISCA '98, pp.81-87, 1981.
DOI : 10.1145/285930.285979

P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood, Lx: A Technology Platform for Customizable VLIW Embedded Processing, Proceedings of the 27th International Symposium of Computer Architecture (ISCA), pp.203-213, 2000.

R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures, 2002.

S. Ammenouche, S. Touati, and W. Jalby, Practical Precise Evaluation of Cache Effects on Low Level Embedded VLIW Computing, HPCS, ECMS proceedings, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00637224

G. Santosh, R. A. Abraham, D. Sugumar, B. R. Windheiser, R. Rau et al., Predictability of load/store instruction latencies, MICRO 26: Proceedings of the 26th annual international symposium on Microarchitecture, pp.139-152, 1993.

S. Touati, Optimal acyclic fine-grain scheduling with cache effects for embedded and real time systems, CODES '01: Proceedings of the ninth international symposium on Hardware/software codesign, pp.159-164, 2001.
URL : https://hal.archives-ouvertes.fr/inria-00637269

Y. Wu, Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching, ACM SIGPLAN Notices, vol.37, issue.5, pp.210-221, 2002.
DOI : 10.1145/543552.512555