J. L. Abellán, J. Fernández, and M. E. Acacio, GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs, Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, 2011.

S. Agathos, N. Kallimanis, and V. Dimakopoulos, Speeding Up OpenMP Tasking, Proceedings of the 18th international conference on Parallel Processing, 2012.
DOI : 10.1007/978-3-642-32820-6_64

T. E. Anderson, The performance of spin lock alternatives for shared-money multiprocessors, IEEE Transactions on Parallel and Distributed Systems, vol.1, issue.1, pp.6-16, 1990.
DOI : 10.1109/71.80120

A. Baumann, P. Barham, P. Dagand, T. Harris, R. Isaacs et al., The multikernel, Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, 2009.
DOI : 10.1145/1629575.1629579

M. Berezecki, E. Frachtenberg, M. Paleczny, and K. Steele, Many-core key-value store, 2011 International Green Computing Conference and Workshops, 2011.
DOI : 10.1109/IGCC.2011.6008565
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.260.715

I. Calciu, J. Gottschlich, and M. Herlihy, Using elimination and delegation to implement a scalable numa-friendly stack, 5th USENIX Workshop on Hot Topics in Parallelism, 2013.

J. Cleary, O. Callanan, M. Purcell, and D. Gregg, Fast asymmetric thread synchronization, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, pp.1-2722, 2013.
DOI : 10.1145/2400682.2400686

P. Fatourou and N. D. Kallimanis, A highly-efficient waitfree universal construction, Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures, 2011.

P. Fatourou and N. D. Kallimanis, Revisiting the combining synchronization technique, Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, 2012.
DOI : 10.1145/2370036.2145849

V. Gramoli, R. Guerraoui, and V. Trigonakis, C, Proceedings of the 7th ACM european conference on Computer Systems, EuroSys '12, 2012.
DOI : 10.1145/2168836.2168872

D. Hendler, I. Incze, N. Shavit, and M. Tzafrir, Flat combining and the synchronization-parallelism tradeoff, Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, SPAA '10, 2010.
DOI : 10.1145/1810479.1810540
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.939

M. Herlihy, B. Lim, and N. Shavit, Scalable concurrent counting, ACM Transactions on Computer Systems, vol.13, issue.4, pp.343-364, 1995.
DOI : 10.1145/210223.210225
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.3392

M. P. Herlihy and J. M. Wing, Linearizability: a correctness condition for concurrent objects, ACM Transactions on Programming Languages and Systems, vol.12, issue.3, pp.463-492, 1990.
DOI : 10.1145/78969.78972

J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan et al., A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS, 2010 IEEE International Solid-State Circuits Conference, (ISSCC), 2010.
DOI : 10.1109/ISSCC.2010.5434077

J. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller, Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications, Proceedings of the 2012 USENIX Annual Technical Conference, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00991709

M. Martin, M. Hill, and D. Sorin, Why on-chip cache coherence is here to stay, Communications of the ACM, vol.55, issue.7, pp.78-89, 2012.
DOI : 10.1145/2209249.2209269

J. M. Mellor-crummey and M. L. Scott, Algorithms for scalable synchronization on shared-memory multiprocessors, ACM Transactions on Computer Systems, vol.9, issue.1, pp.21-65, 1991.
DOI : 10.1145/103727.103729
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.3994

Z. Metreveli, N. Zeldovich, and M. F. Kaashoek, CPHASH: a cache-partitioned hash table, Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, 2012.

M. M. Michael and M. L. Scott, Simple, fast, and practical non-blocking and blocking concurrent queue algorithms, Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing , PODC '96, 1996.
DOI : 10.1145/248052.248106
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.3574

A. Morrison and Y. Afek, Fast concurrent queues for x86 processors, Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, 2013.
DOI : 10.1145/2442516.2442527
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.420.8930

S. Owens, S. Sarkar, and P. Sewell, A Better x86 Memory Model: x86-TSO, Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics, 2009.
DOI : 10.1007/11817963_46

Y. Oyama, K. Taura, and A. Yonezawa, Executing parallel programs with synchronization bottlenecks efficiently, Proceedings of the International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, 1999.

N. Shavit and D. Touitou, Elimination trees and the construction of pools and stacks, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures , SPAA '95, 1995.
DOI : 10.1145/215399.215419

D. Sorin, M. Hill, and D. Wood, A Primer on Memory Consistency and Cache Coherence, Synthesis Lectures on Computer Architecture, vol.6, issue.3, pp.1-212, 2011.
DOI : 10.2200/S00346ED1V01Y201104CAC016

M. A. Suleman, O. Mutlu, M. Qureshi, and Y. Patt, Accelerating Critical Section Execution with Asymmetric Multicore Architectures, IEEE Micro, vol.30, issue.1, pp.60-70, 2010.
DOI : 10.1109/MM.2010.7
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.152.2653

R. K. Treiber, Systems Programming: Coping with Parallelism, 1986.

D. Wentzlaff and A. Agarwal, Factored operating systems (fos), ACM SIGOPS Operating Systems Review, vol.43, issue.2, pp.76-85, 2009.
DOI : 10.1145/1531793.1531805