GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs, Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, 2011. ,
Speeding Up OpenMP Tasking, Proceedings of the 18th international conference on Parallel Processing, 2012. ,
DOI : 10.1007/978-3-642-32820-6_64
The performance of spin lock alternatives for shared-money multiprocessors, IEEE Transactions on Parallel and Distributed Systems, vol.1, issue.1, pp.6-16, 1990. ,
DOI : 10.1109/71.80120
The multikernel, Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, 2009. ,
DOI : 10.1145/1629575.1629579
Many-core key-value store, 2011 International Green Computing Conference and Workshops, 2011. ,
DOI : 10.1109/IGCC.2011.6008565
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.260.715
Using elimination and delegation to implement a scalable numa-friendly stack, 5th USENIX Workshop on Hot Topics in Parallelism, 2013. ,
Fast asymmetric thread synchronization, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, pp.1-2722, 2013. ,
DOI : 10.1145/2400682.2400686
A highly-efficient waitfree universal construction, Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures, 2011. ,
Revisiting the combining synchronization technique, Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, 2012. ,
DOI : 10.1145/2370036.2145849
C, Proceedings of the 7th ACM european conference on Computer Systems, EuroSys '12, 2012. ,
DOI : 10.1145/2168836.2168872
Flat combining and the synchronization-parallelism tradeoff, Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, SPAA '10, 2010. ,
DOI : 10.1145/1810479.1810540
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.939
Scalable concurrent counting, ACM Transactions on Computer Systems, vol.13, issue.4, pp.343-364, 1995. ,
DOI : 10.1145/210223.210225
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.3392
Linearizability: a correctness condition for concurrent objects, ACM Transactions on Programming Languages and Systems, vol.12, issue.3, pp.463-492, 1990. ,
DOI : 10.1145/78969.78972
A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS, 2010 IEEE International Solid-State Circuits Conference, (ISSCC), 2010. ,
DOI : 10.1109/ISSCC.2010.5434077
Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications, Proceedings of the 2012 USENIX Annual Technical Conference, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00991709
Why on-chip cache coherence is here to stay, Communications of the ACM, vol.55, issue.7, pp.78-89, 2012. ,
DOI : 10.1145/2209249.2209269
Algorithms for scalable synchronization on shared-memory multiprocessors, ACM Transactions on Computer Systems, vol.9, issue.1, pp.21-65, 1991. ,
DOI : 10.1145/103727.103729
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.3994
CPHASH: a cache-partitioned hash table, Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, 2012. ,
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms, Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing , PODC '96, 1996. ,
DOI : 10.1145/248052.248106
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.3574
Fast concurrent queues for x86 processors, Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, 2013. ,
DOI : 10.1145/2442516.2442527
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.420.8930
A Better x86 Memory Model: x86-TSO, Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics, 2009. ,
DOI : 10.1007/11817963_46
Executing parallel programs with synchronization bottlenecks efficiently, Proceedings of the International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, 1999. ,
Elimination trees and the construction of pools and stacks, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures , SPAA '95, 1995. ,
DOI : 10.1145/215399.215419
A Primer on Memory Consistency and Cache Coherence, Synthesis Lectures on Computer Architecture, vol.6, issue.3, pp.1-212, 2011. ,
DOI : 10.2200/S00346ED1V01Y201104CAC016
Accelerating Critical Section Execution with Asymmetric Multicore Architectures, IEEE Micro, vol.30, issue.1, pp.60-70, 2010. ,
DOI : 10.1109/MM.2010.7
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.152.2653
Systems Programming: Coping with Parallelism, 1986. ,
Factored operating systems (fos), ACM SIGOPS Operating Systems Review, vol.43, issue.2, pp.76-85, 2009. ,
DOI : 10.1145/1531793.1531805