The case for GPGPU spatial multitasking, IEEE International Symposium on High-Performance Comp Architecture, pp.1-12, 2012. ,
DOI : 10.1109/HPCA.2012.6168946
Improving GPU Performance Prediction with Data Transfer Modeling, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, pp.1097-1106, 2013. ,
DOI : 10.1109/IPDPSW.2013.236
URL : http://www.cs.virginia.edu/~mwb7w/publications/ASHES_13_data_transfer_modeling.pdf
Rodinia: A benchmark suite for heterogeneous computing, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.44-54, 2009. ,
DOI : 10.1109/IISWC.2009.5306797
URL : http://www.cs.virginia.edu/~skadron/Papers/rodinia_iiswc09.pdf
Performance models for asynchronous data transfers on consumer Graphics Processing Units, Journal of Parallel and Distributed Computing, vol.72, issue.9, pp.1117-1126, 2012. ,
DOI : 10.1016/j.jpdc.2011.07.011
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, pp.134-144, 2011. ,
DOI : 10.1109/ISPASS.2011.5762730
Computer Architecture: A Quantitative Approach, 2006. ,
GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems, IEICE Transactions on Information and Systems, vol.96, issue.12, pp.96-2604, 2013. ,
DOI : 10.1587/transinf.E96.D.2604
Software pipelining for graphic processing unit acceleration: Partition, scheduling and granularity, The International Journal of High Performance Computing Applications, vol.23, issue.12, p.1094342015585845, 2015. ,
DOI : 10.1007/978-3-642-03869-3_82
Modeling and predicting performance of high performance computing applications on hardware accelerators, International Journal of High Performance Computing Applications, vol.27, issue.2, pp.89-108, 2013. ,
A Survey of CPU-GPU Heterogeneous Computing Techniques, ACM Computing Surveys, vol.47, issue.4, 2015. ,
DOI : 10.1109/CLUSTER.2012.34
CUDA C Best Practices Guide Version 7, 2015. ,
MDR, Proceedings of the international conference on Supercomputing, ICS '11, pp.225-234, 2011. ,
DOI : 10.1145/1995896.1995933
SPRAT: Runtime processor selection for energy-aware computing, 2008 IEEE International Conference on Cluster Computing, pp.386-393, 2008. ,
DOI : 10.1109/CLUSTR.2008.4663799
URL : http://www.sc.isc.tohoku.ac.jp/~tacky/papers/htakizawa_iwapt2008.pdf
Performance Models for CPU-GPU Data Transfers, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.11-20, 2014. ,
DOI : 10.1109/CCGrid.2014.16
Concurrent Kernel Execution on Xeon Phi within Parallel Heterogeneous Workloads, Euro-Par 2014 Parallel Processing, pp.788-799, 2014. ,
DOI : 10.1007/978-3-319-09873-9_66
Multi-threaded kernel offloading to GPGPU using Hyper-Q on kepler architecture, 2014. ,
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing, 2010 IEEE International Conference on Cluster Computing, pp.19-28, 2010. ,
DOI : 10.1109/CLUSTER.2010.12
Jiachang Sun, Guangwen Yang, and Weimin Zheng. A peta-scalable CPU- GPU algorithm for global atmospheric simulations, Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pp.1-12, 2013. ,