S. Acer, O. Selvitopi, and C. Aykanat, Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems, Parallel Computing, vol.59, pp.71-96, 2016.

S. Alford, R. Robinett, L. Milechin, and J. Kepner, Pruned and structurally sparse neural networks. CoRR, 2018.

T. Ben-nun and T. Hoefler, Demystifying parallel and distributed deep learning: An in-depth concurrency analysis, ACM Computing Surveys (CSUR), vol.52, issue.4, pp.1-43, 2019.

M. Bisson and M. Fatica, A GPU implementation of the sparse deep neural network graph challenge, IEEE High Performance Extreme Computing Conference (HPEC), 2019.

A. Buluc, T. Mattson, S. Mcmillan, J. Moreira, and C. Yang, The graphBLAS C API specification, GraphBLAS. org, Tech. Rep, 2017.

Ü. V. Çatalyürek and C. Aykanat, PaToH: A Multilevel Hypergraph Partitioning Tool, 1999.

Ü. V. Çatalyürek and C. Aykanat, A fine-grain hypergraph model for 2d decomposition of sparse matrices, IPDPS, vol.1, p.118, 2001.

T. Davis, M. Aznaveh, and S. Kolodziej, Write quick, run fast: Sparse deep neural network in 20 minutes of development time in SuiteSparse:GraphBLAS, IEEE High Performance Extreme Computing Conference (HPEC), 2019.

J. A. Ellis and S. Rajamanickam, Scalable inference for sparse deep neural networks using Kokkos kernels, IEEE High Performance Extreme Computing Conference (HPEC), 2019.

J. Kepner, S. Alford, V. Gadepally, M. Jones, L. Milechin et al., Sparse Deep Neural Network Graph Challenge. arXiv e-prints, art, 2019.

J. Kepner, S. Alford, V. Gadepally, M. Jones, L. Milechin et al., Graphchallenge.org sparse deep neural network performance, 2020.

Y. Lecun and C. Cortes, MNIST handwritten digit database, 2010.

T. Lengauer, Combinatorial Algorithms for Integrated Circuit Layout, 1990.

M. H. Mofrad, R. Melhem, Y. Ahmad, and M. Hammoud, Multithreaded layer-wise training of sparse deep neural networks using compressed sparse column, IEEE High Performance Extreme Computing Conference (HPEC), 2019.

Y. Nagasaka, S. Matsuoka, A. Azad, and A. Buluç, High-performance sparse matrix-matrix products on Intel KNL and multicore architectures, Proceedings of the 47th International Conference on Parallel Processing Companion, ICPP '18, 2018.

B. Uçar and C. Aykanat, Partitioning sparse matrices for parallel preconditioned iterative methods, SIAM Journal on Scientific Computing, vol.29, issue.4, pp.1683-1709, 2007.

B. Van-der-lugt and R. H. Bisseling, Banded Sparse Neural Networks and their parallel computation, 2018.

J. Wang, Z. Huang, L. Kong, J. Xiao, P. Wang et al., Performance of training sparse deep neural networks on GPUs, IEEE High Performance Extreme Computing Conference (HPEC), 2019.

X. Wang, Z. Lin, C. Yang, and J. D. Owens, Accelerating DNN inference with GraphBLAS and the GPU, IEEE High Performance Extreme Computing Conference (HPEC), 2019.

A. N. Yzelman and D. Roose, High-level strategies for parallel shared-memory sparse matrixvector multiplication, IEEE Transactions on Parallel and Distributed Systems, vol.25, issue.1, pp.116-125, 2013.