An adaptive performance modeling tool for GPU architectures, ACM SIGPLAN Notices, vol.45, issue.5, pp.105-114, 2010. ,
DOI : 10.1145/1837853.1693470
Analyzing CUDA workloads using a detailed GPU simulator, 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp.163-174, 2009. ,
DOI : 10.1109/ISPASS.2009.4919648
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.507.8371
Barra: A Parallel Functional Simulator for GPGPU, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp.351-360, 2010. ,
DOI : 10.1109/MASCOTS.2010.43
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness, ACM SIGARCH Computer Architecture News, vol.37, issue.3, pp.152-163, 2009. ,
DOI : 10.1145/1555815.1555775
Efficient SIMDization and Data Management of the Lattice QCD Computation on the Cell Broadband Engine, Scientific Programming, vol.17, issue.1-2, pp.153-172, 2009. ,
DOI : 10.1155/2009/634756
CuMAPz, Proceedings of the 48th Design Automation Conference on, DAC '11, pp.128-133, 2011. ,
DOI : 10.1145/2024724.2024754
Optimizing matrix transpose in cuda, 2009. ,
Program optimization space pruning for a multithreaded gpu, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.195-204, 2008. ,
DOI : 10.1145/1356058.1356084
Demystifying GPU microarchitecture through microbenchmarking, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp.235-246, 2010. ,
DOI : 10.1109/ISPASS.2010.5452013
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.189.5309
A quantitative performance analysis model for GPU architectures, 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pp.249-6399 ,
DOI : 10.1109/HPCA.2011.5749745