J. Sim, . Dasgupta, . Aniruddha, . Kim, . Hyesoon et al., A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications, 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '12), pp.11-22, 2012.

S. Khondker, J. K. Hasan, S. Antonio, and . Radhakrishnan, A New Composite CPU/Memory Model for Predicting Efficiency of Multi-core Processing, The 20th IEEE International Symposium on High Performance Computer Architecture (HPCA-2014) workshop, 2014.

K. John, D. L. Holmen, and . Foster, Accelerating Single Iteration Performance of CUDA-Based 3D ReactionDiffusion Simulations, International Journal of Parallel Programming, vol.42, issue.2, pp.343-363, 2014.

A. Chatterjee, . Radhakrishnan, A. Sridhar, and J. K. , Data Structures and Algorithms for Counting Problems on Graphs using GPU, International Journal of Networking and Computing, vol.3, issue.2, pp.264-288, 2013.
DOI : 10.15803/ijnc.3.2_264

S. Khondker, S. Hasan, J. K. Radhakrishnan, and . Antonio, Composite Prediction Model and Task Distribution on a Cloud of Multi-core Processors, IEEE International Conference on High Performance Computing (HiPC-14) workshop, 2013.

Y. Zhang and J. D. Owens, A quantitative performance analysis model for GPU architectures, 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pp.382-393, 2011.
DOI : 10.1109/HPCA.2011.5749745

C. Warps and . Occupancy, GPU Computing Webinar Available from: http://on-demand.gputechconf.com/gtc-express, 2011.

L. Siu-kwan, CUDA Performance: Maximizing Instruction-Level Parallelism Available from: http://continuum.io/blog/cudapy ilp opt, 2013.