M. Harris, Unied memory in cuda 6, GTC On-Demand, 2013.

E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, Nvidia tesla: a unied graphics and computing architecture, Proceedings of IEEE Micro, p.28, 2008.

R. Landaverde, T. Zhang, A. Coskun, and M. Herbordt, An investigation of unied memory access performance in cuda, Proceedings of IEEE High Performance Extreme Computing Conference, p.16, 2014.

T. Zheng, D. Nellans, A. Zulqar, M. Stephenson, and S. W. Keckler, Towards high performance paged memory for gpus, Proceedings of IEEE International Symposium on High Performance Computer Architecture, p.345357, 2016.

D. Lustig and M. Martonosi, Reducing gpu ooad latency via ne-grained cpu-gpu synchronization, Proceedings of IEEE International Symposium on High Performance Computer Architecture, p.354365, 2013.

N. Agarwal, D. Nellans, M. Stephenson, M. Oconnor, and S. W. Keckler, Page placement strategies for gpus within heterogeneous memory systems, ACM SIG-PLAN Notices, vol.50, p.607618, 2015.

J. Vesely, A. Basu, M. Oskin, G. Loh, and A. Bhattacharjee, Observations and opportunities in architecting shared virtual memory for heterogeneous systems, Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software, p.161171, 2016.

R. Ausavarungnirun, J. Landgraf, V. Miller, S. Ghose, J. Gandhi et al., Mosaic: A GPU Memory Manager with ApplicationTransparent Support for Multiple Page Sizes, 2017.

A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. M. Aamodt, Analyzing cuda workloads using a detailed gpu simulator, Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software, p.163174, 2009.

T. Aamodt, W. Fung, I. Singh, A. El-shaey, J. Kwa et al., Gpgpu-sim 3.x manual, 2012.

J. Ajanovic, Pci express 3.0 overview, Proceedings of Hot Chips: A Symposium on High Performance Chips, 2009.