A. Mazouz, S. Touati, and D. Barthou, Analysing the Variability of OpenMP Programs Performances on Multicore Architectures, Proc. of Programmability Issues for Heterogeneous Multicores, in conjunction with the HIPEAC conference, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00637957

R. Jain, The Art of Computer Systems Performance Analysis : Techniques for Experimental Design, Measurement, Simulation, and Modelling, 1991.

S. Touati, J. Worms, and S. Briais, The Speedup- Test University of Versailles Saint-Quentin en Yvelines, Tech. Rep, 2010.

T. Mytkowicz, A. Diwan, P. F. Sweeney, and M. Hauswirth, Producing wrong data without doing anything obviously wrong, Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009.
DOI : 10.1145/2528521.1508275

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

G. Karypis and V. Kumar, Parallel multilevel k-way partitioning scheme for irregular graphs, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '96, pp.96-1291404, 1997.
DOI : 10.1145/369028.369103

J. Edmonds, Maximum matching and a polyhedron with 0,1-vertices, Journal of Research of the National Bureau of Standards Section B Mathematics and Mathematical Physics, vol.69, issue.1 and 2, pp.125-130, 1965.
DOI : 10.6028/jres.069B.013

E. Z. Zhang, Y. Jiang, and X. Shen, Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs, PPoPP '10: Proc. of the ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp.203-212, 2010.

C. Bienia, S. Kumar, J. P. Singh, and K. Li, The PARSEC benchmark suite, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, 2008.
DOI : 10.1145/1454115.1454128

D. Tam, R. Azimi, and M. Stumm, Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , booktitle = EuroSys '07, Proc. of theACM SIGOPS/EuroSys European Conference on Computer Systems, pp.47-58, 2007.

]. T. Klug, M. Ott, J. Weidendorfer, and C. Trinitis, autopin ??? Automated Optimization of Thread-to-Core Pinning on Multicore Systems, Transactions on High-Performance Embedded Architectures and Compilers, 2008.
DOI : 10.1007/978-3-540-73940-1_33

V. Kazempour, A. Fedorova, and P. Alagheband, Performance Implications of Cache Affinity on Multicore Processors, Euro-Par '08: the international Euro-Par conference on Parallel Processing, pp.151-161, 2008.
DOI : 10.1007/978-3-540-85451-7_17

C. Terboven, D. Mey, D. Schmidl, H. Jin, and T. Reichstein, Data and thread affinity in openmp programs, Proceedings of the 2008 workshop on Memory access on future processors a solved problem?, MAW '08, pp.377-384, 2008.
DOI : 10.1145/1366219.1366222

F. Song, S. Moore, and J. Dongarra, Analytical modeling and optimization for affinity based thread scheduling on multicore systems, 2009 IEEE International Conference on Cluster Computing and Workshops, 2009.
DOI : 10.1109/CLUSTR.2009.5289173

M. Chu, R. Ravindran, and S. Mahlke, Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp.369-380, 2007.
DOI : 10.1109/MICRO.2007.15

J. Lee, H. Wu, M. Ravichandran, and N. Clark, Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications, ISCA '10: Proc. of the annual international symposium on Computer architecture, pp.270-279, 2010.

M. Kandemir, T. Yemliha, S. Muralidhara, S. Srikantaiah, M. J. Irwin et al., Cache topology aware computation mapping for multicores, ACM SIGPLAN Notices, vol.45, issue.6, pp.74-85, 2010.
DOI : 10.1145/1809028.1806605