J. T. Adriaens, K. Compton, N. S. Kim, and M. J. Schulte, The case for GPGPU spatial multitasking, IEEE International Symposium on High-Performance Comp Architecture, pp.1-12, 2012.
DOI : 10.1109/HPCA.2012.6168946

M. Boyer, J. Meng, and K. Kumaran, Improving GPU Performance Prediction with Data Transfer Modeling, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, pp.1097-1106, 2013.
DOI : 10.1109/IPDPSW.2013.236
URL : http://www.cs.virginia.edu/~mwb7w/publications/ASHES_13_data_transfer_modeling.pdf

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer et al., Rodinia: A benchmark suite for heterogeneous computing, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.44-54, 2009.
DOI : 10.1109/IISWC.2009.5306797
URL : http://www.cs.virginia.edu/~skadron/Papers/rodinia_iiswc09.pdf

J. Gómez-luna, J. M. González-linares, J. I. Benavides, and N. Guil, Performance models for asynchronous data transfers on consumer Graphics Processing Units, Journal of Parallel and Distributed Computing, vol.72, issue.9, pp.1117-1126, 2012.
DOI : 10.1016/j.jpdc.2011.07.011

C. Gregg and K. Hazelwood, Where is the data? Why you cannot debate CPU vs. GPU performance without the answer, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, pp.134-144, 2011.
DOI : 10.1109/ISPASS.2011.5762730

L. John, D. A. Hennessy, and . Patterson, Computer Architecture: A Quantitative Approach, 2006.

F. Ino, S. Nakagawa, and K. Hagihara, GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems, IEICE Transactions on Information and Systems, vol.96, issue.12, pp.96-2604, 2013.
DOI : 10.1587/transinf.E96.D.2604

B. Liu, W. Qiu, L. Jiang, and Z. Gong, Software pipelining for graphic processing unit acceleration: Partition, scheduling and granularity, The International Journal of High Performance Computing Applications, vol.23, issue.12, p.1094342015585845, 2015.
DOI : 10.1007/978-3-642-03869-3_82

R. Mitesh, L. Meswani, D. Carrington, A. Unat, S. Snavely et al., Modeling and predicting performance of high performance computing applications on hardware accelerators, International Journal of High Performance Computing Applications, vol.27, issue.2, pp.89-108, 2013.

S. Mittal and J. S. Vetter, A Survey of CPU-GPU Heterogeneous Computing Techniques, ACM Computing Surveys, vol.47, issue.4, 2015.
DOI : 10.1109/CLUSTER.2012.34

N. Inc, CUDA C Best Practices Guide Version 7, 2015.

J. A. Pienaar, A. Raghunathan, and S. Chakradhar, MDR, Proceedings of the international conference on Supercomputing, ICS '11, pp.225-234, 2011.
DOI : 10.1145/1995896.1995933

H. Takizawa, K. Sato, and H. Kobayashi, SPRAT: Runtime processor selection for energy-aware computing, 2008 IEEE International Conference on Cluster Computing, pp.386-393, 2008.
DOI : 10.1109/CLUSTR.2008.4663799
URL : http://www.sc.isc.tohoku.ac.jp/~tacky/papers/htakizawa_iwapt2008.pdf

B. Van-werkhoven, J. Maassen, F. J. Seinstra, and H. E. Bal, Performance Models for CPU-GPU Data Transfers, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.11-20, 2014.
DOI : 10.1109/CCGrid.2014.16

F. Wende, T. Steinke, and F. Cordes, Concurrent Kernel Execution on Xeon Phi within Parallel Heterogeneous Workloads, Euro-Par 2014 Parallel Processing, pp.788-799, 2014.
DOI : 10.1007/978-3-319-09873-9_66

F. Wende, T. Steinke, and F. Cordes, Multi-threaded kernel offloading to GPGPU using Hyper-Q on kepler architecture, 2014.

C. Yang, F. Wang, Y. Du, J. Chen, J. Liu et al., Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing, 2010 IEEE International Conference on Cluster Computing, pp.19-28, 2010.
DOI : 10.1109/CLUSTER.2010.12

C. Yang, W. Xue, H. Fu, L. Gan, L. Li et al., Jiachang Sun, Guangwen Yang, and Weimin Zheng. A peta-scalable CPU- GPU algorithm for global atmospheric simulations, Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pp.1-12, 2013.