L. Liu and S. Rus, Perflint: A Context Sensitive Performance Advisor for C++ Programs, 2009 International Symposium on Code Generation and Optimization, 2009.
DOI : 10.1109/CGO.2009.36

N. R. Tallent, J. M. Mellor-crummey, and M. W. Fagan, Binary analysis for measurement and attribution of program performance, PLDI, 2009.

T. Moseley, D. A. Connors, D. Grunwald, and R. Peri, Identifying potential parallelism via loopcentric profiling, Proceedings of the 2007 International Conference on Computing Frontiers, 2007.
DOI : 10.1145/1242531.1242554

G. D. Price, J. Giacomoni, and M. Vachharajani, Visualizing potential parallelism in sequential programs, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, 2008.
DOI : 10.1145/1454115.1454129

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.5696

C. B. Zilles, Benchmark health considered harmful, SIGARCH Computer Architecture News, 2001.
DOI : 10.1145/503205.503206

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.6730

T. Moseley, D. Grunwald, and R. V. Peri, OptiScope: Performance Accountability for Optimizing Compilers, 2009 International Symposium on Code Generation and Optimization, 2009.
DOI : 10.1109/CGO.2009.26

T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney, Producing wrong data without doing anything obviously wrong, ASPLOS, 2009.
DOI : 10.1145/1508284.1508275

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.163.8395

T. Moseley, A. Shye, V. J. Reddi, M. Iyer, D. Fay et al., Dynamic run-time architecture techniques for enabling continuous optimization, Proceedings of the 2nd conference on Computing frontiers , CF '05, 2005.
DOI : 10.1145/1062261.1062296

D. Knights, T. Mytkowicz, P. F. Sweeney, M. C. Mozer, and A. Diwan, Blind Optimization for Exploiting Hardware Features, Conference on Compiler Construction, 2009.
DOI : 10.1145/268424.268469

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.1442

Z. Pan and R. Eigenmann, Fast, automatic, procedure-level performance tuning, Proceedings of the 15th international conference on Parallel architectures and compilation techniques , PACT '06, pp.173-181, 2006.
DOI : 10.1145/1152154.1152182

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.77.4089

C. R. Whaley and J. J. Dongarra, Automatically Tuned Linear Algebra Software, Proceedings of the IEEE/ACM SC98 Conference, 1998.
DOI : 10.1109/SC.1998.10004

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.3487

J. Callister, Confessions of a performance monitor hardware designer, " in Workshop on Hardware Performance Monitor Design and Functionality colocated with HPCA Amd lightweight profiling specification, 2005.

J. M. Anderson, L. M. Berc, J. Dean, S. Ghemawat, M. R. Henzinger et al., Continuous profiling: where have all the cycles gone, SOSP '97: Proceedings of the sixteenth ACM symposium on Operating systems principles, pp.1-14, 1997.
DOI : 10.1145/265924.265925

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.124.5957

J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Z. Chrysos, Profileme : Hardware support for instruction-level profiling on out-of-order processors Instruction-based sampling: A new performance analysis technique for amd family 10h processors, International Symposium on Microarchitecture, pp.292-302, 1997.

H. C. Hunter and R. Nair, Refining performance monitor design, Proceedings of the 2004 Workshop on Complexity Effective Design (WCED), 2004.

J. Cavazos, C. Dubach, F. Agakov, E. Bonilla, M. F. O-'boyle et al., Automatic performance model construction for the fast software exploration of new hardware designs, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems , CASES '06, 2006.
DOI : 10.1145/1176760.1176765

B. Sprunt, Performance monitoring hardware will always be a low priority, second class feature in processor designs until, Workshop on Hardware Performance Monitor Design and Functionality colocated with HPCA, 2005.

T. Moseley, J. L. Kihm, D. A. Connors, and D. Grunwald, Methods for modeling resource contention on simultaneous multithreading processors, 2005 International Conference on Computer Design, 2005.
DOI : 10.1109/ICCD.2005.74

URL : http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.470.4203&rep=rep1&type=pdf

E. Ould-ahmed-vall, J. Woodlee, C. Yount, K. A. Doshi, and S. Abraham, Using Model Trees for Computer Architecture Performance Analysis of Software Applications, 2007 IEEE International Symposium on Performance Analysis of Systems & Software, 2007.
DOI : 10.1109/ISPASS.2007.363742

X. Dai, A. Zhai, W. Hsu, and P. Yew, A general compiler framework for speculative optimizations using data speculative code motion, CGO '05: Proceedings of the international symposium on Code generation and optimization, 2005.

M. M. Isci and G. Contreras, Hardware performance counters for detailed runtime power and thermal estimations: Experiences and proposals, Workshop on Hardware Performance Monitor Design and Functionality colocated with HPCA, 2005.

T. Moseley, Adaptive thread scheduling for simultaneous multithreading processors, Boulder, CO, 2006.

A. Shye, M. Iyer, T. Moseley, D. Hodgdon, D. Fay et al., Analyis of Path Profiling Information Generated with Performance Monitoring Hardware, 9th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT'05), pp.34-43, 2005.
DOI : 10.1109/INTERACT.2005.3

A. Shye, B. Ozisikyilmaz, A. Mallik, G. Memik, P. A. Dinda et al., Learning and leveraging the relationship between architecture-level measurements and individual user satisfaction, ISCA, 2008.
DOI : 10.1145/1394608.1382158

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.144.2358

M. M. Tikir, B. R. Buck, and J. K. Hollingsworth, What we need to be able to count to tune programs, Workshop on Hardware Performance Monitor Design and Functionality colocated with HPCA, 2005.

I. Tuduce and T. Gross, Efficient collection of information on the locality of accesses, Workshop on Hardware Performance Monitor Design and Functionality colocated with HPCA, 2005.

B. Brantley, The NUMA challenge, Workshop on Hardware Performance Monitor Design and Functionality colocated with HPCA, 2005.

A. Rishi and J. A. Masamitsu, Us patent no. 5953530. method and apparatus for run-time memory access checking and memory leak detection

T. M. Conte, B. A. Patel, K. N. Menezes, and J. S. Cox, Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization, International Journal of Parallel Programming, vol.7, issue.7, 1996.
DOI : 10.4135/9781412985451

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.9370

B. A. Fields, R. Bodik, M. D. Hill, and C. J. Newburn, Interaction cost and shotgun profiling, ACM Transactions on Architecture and Code Optimization, vol.1, issue.3, 2004.
DOI : 10.1145/1022969.1022971

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.1364

C. B. Zilles and G. S. Sohi, A programmable co-processor for profiling, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture, 2001.
DOI : 10.1109/HPCA.2001.903267

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.786

V. M. Weaver and S. A. Mckee, Can hardware performance counters be trusted? " in IISWC, 2008.
DOI : 10.1109/iiswc.2008.4636099

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.141.1880

P. Mucci, N. Smeds, and P. Ekman, Performance monitoring with papi using the performance application programming interface, Dr. Dobb's, 2005.

P. Mucci, Towards a flexible and realistic hardware performance monitor infrastructure, Workshop on Hardware Performance Monitor Design and Functionality colocated with HPCA, 2005.

B. Sprunt, Managing The Complexity Of Performance Monitoring Hardware: The Brink Andabyss Approach, The International Journal of High Performance Computing Applications, vol.22, issue.4, 2006.
DOI : 10.1109/MM.2002.1028477

R. S. Daniel-molka, D. Hackenberg, and M. S. Mller, Memory performance and cache coherency effects on an intel nehalem multiprocessor system

R. Fowler, Performance hardware if i ran the world, Workshop on Hardware Performance Monitor Design and Functionality colocated with HPCA, 2005.

R. Levin, I. Newman, and G. Haber, Complementing Missing and Inaccurate Profiling Using a Minimum Cost Circulation Algorithm, HiPEAC, 2008.
DOI : 10.1007/978-3-540-77560-7_20

D. C. Todd-mytkowicz and A. Diwan, Inferred call path profiling, OOPSLA, 2009.

C. Amd-codeanalyst, R. Luk, R. Cohn, H. Muth, A. Patil et al., Pin: building customized program analysis tools with dynamic instrumentation, PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pp.190-200, 2005.

N. Nethercote and J. Seward, Valgrind: A framework for heavyweight dynamic binary instrumentation, Proceedings of ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007), 2007.

D. L. Bruening, Efficient, transparent, and comprehensive runtime code manipulation, 2004.

P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hllberg et al., Simics: A full system simulation platform, Computer, vol.35, issue.2, 2002.
DOI : 10.1109/2.982916

K. Hoste and L. Eeckhout, Microarchitecture-Independent Workload Characterization, IEEE Micro, vol.27, issue.3, pp.63-72, 2007.
DOI : 10.1109/MM.2007.56

X. Zhang, Z. Wang, N. C. Gloy, J. B. Chen, and M. D. Smith, System support for automated profiling and optimization, SOSP, 1997.
DOI : 10.1145/269005.266640

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.137.2900

M. Hirzel and T. Chilimbi, Bursty tracing: A framework for low-overhead temporal profiling, 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, 2001.

M. Arnold and B. G. Ryder, A framework for reducing the cost of instrumented code, SIGPLAN Conference on Programming Language Design and Implementation, pp.168-179, 2001.

T. Moseley, A. Shye, V. J. Reddi, D. Grunwald, and R. V. Peri, Shadow Profiling: Hiding Instrumentation Costs with Parallelism, International Symposium on Code Generation and Optimization (CGO'07), 2007.
DOI : 10.1109/CGO.2007.35

K. Hoste, A. Phansalkar, L. Eeckhout, A. Georges, L. K. John et al., Performance prediction based on inherent program similarity, Proceedings of the 15th international conference on Parallel architectures and compilation techniques , PACT '06, 2006.
DOI : 10.1145/1152154.1152174

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.109.2528

R. Shaham, E. K. Kolodner, and M. Sagiv, Heap profiling for space-efficient java, PLDI '01: Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, 2001.
DOI : 10.1145/381694.378820

L. Djoudi, D. Barthou, P. Carribault, C. Lemuet, J. Acquaviva et al., Exploring application performance: a new tool for a static/dynamic approach, Los Alamos Computer Science Institute Symp, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00141071

M. Iyer, C. Ashok, J. Stone, N. Vachharajani, D. A. Connors et al., Finding parallelism for future epic machines, Proceedings of the Fourth Workshop on Explicitly Parallel Instruction Computer Architectures and Compiler Technology (EPIC), 2005.

G. Fursin, M. O. Boyle, O. Temam, and G. Watts, A fast and accurate method for determining a lower bound on execution time, Concurrency: Practice and Experience, pp.271-292, 2004.
DOI : 10.1002/cpe.774