, 2 The critical path length and the elimination tree height, p.134

. .. Permutebbt, 138 7.3.4 Edge-separator-based method

.. .. Experiments,

. .. Summary, , p.145

, Sparse tensor ordering 147

. .. , Three sparse tensor storage formats, p.147

.. .. Experiments,

. .. Summary, , p.160

I. Ilog and . Optimizer, , 2010.

E. Acar, D. M. Dunlavy, and T. G. Kolda, A scalable optimization approach for fitting canonical tensor decompositions, Journal of Chemometrics, vol.25, issue.2, pp.67-86, 2011.

S. Acer, T. Torun, and C. Aykanat, Improving mediumgrain partitioning for scalable sparse tensor decomposition, IEEE Transactions on Parallel and Distributed Systems, vol.29, issue.12, pp.2814-2825, 2018.

K. Agrawal, J. T. Fineman, J. Krage, C. E. Leiserson, and S. Toledo, Cache-conscious scheduling of streaming applications, Proc. Twenty-fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp.236-245, 2012.

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Multifrontal QR factorization for multicore architectures over runtime systems, Euro-Par 2013 Parallel Processing, pp.521-532, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01220611

R. Aharoni and P. Haxell, Hall's theorem for hypergraphs, Journal of Graph Theory, vol.35, issue.2, pp.83-88, 2000.

K. Akbudak and C. Aykanat, Exploiting locality in sparse matrix-matrix multiplication on many-core architectures, IEEE Transactions on Parallel and Distributed Systems, vol.28, issue.8, pp.2258-2271, 2017.

P. R. Amestoy, I. S. Duff, and J. Excellent, Multifrontal parallel distributed symmetric and unsymmetric solvers, Computer Methods in Applied Mechanics and Engineering, vol.184, issue.2-4, pp.501-520, 2000.
URL : https://hal.archives-ouvertes.fr/hal-00856651

P. R. Amestoy, I. S. Duff, D. Ruiz, B. Uçar, ;. P. Amestoy et al., A parallel matrix scaling algorithm, 8th International Conference on High Performance Computing for Computational Science (VECPAR):, volume, vol.5336, pp.301-313, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00803489

M. Anastos and A. Frieze, Finding perfect matchings in random cubic graphs in linear time, 2018.

C. A. Andersson and R. Bro, The N-way toolbox for MAT-LAB, vol.52, pp.1-4, 2000.

H. Anzt, E. Chow, and J. Dongarra, Iterative sparse triangular solves for preconditioning, Euro-Par 2015 Parallel Processing, pp.650-661, 2015.

J. Aronson, M. Dyer, A. Frieze, and S. Suen, Randomized greedy matching II, Random Structures & Algorithms, vol.6, issue.1, pp.55-73, 1995.

J. Aronson, A. Frieze, and B. G. Pittel, Maximum matchings in sparse random graphs: Karp-Sipser revisited, Random Structures & Algorithms, vol.12, issue.2, pp.111-177, 1998.

C. Aykanat, B. Berkant, B. Cambazoglu, and . Uçar, Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices, Journal of Parallel and Distributed Computing, vol.68, pp.609-625, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00803479

A. Azad, M. Halappanavar, S. Rajamanickam, E. G. Boman, A. Khan et al., Multithreaded algorithms for maximum matching in bipartite graphs, IEEE 26th International Parallel Distributed Processing Symposium (IPDPS), pp.860-872, 2012.

W. Brett, T. G. Bader, and . Kolda, Efficient MATLAB computations with sparse and factored tensors, SIAM Journal on Scientific Computing, vol.30, issue.1, pp.205-231, 2007.

W. Brett, T. G. Bader, and . Kolda, Matlab tensor toolbox version 3, 2017.

D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, Graph Partitioning and Graph Clustering -10th DIMACS Implementation Challenge Workshop, vol.588, 2012.

M. Baskaran, T. Henretty, M. Benoit-pradelle, D. Harper-langston, J. Bruns-smith et al., Memory-efficient parallel tensor decompositions, 2017 IEEE High Performance Extreme Computing Conference (HPEC), pp.1-7, 2017.

J. Bennett and S. Lanning, The Netflix Prize, Proceedings of KDD cup and workshop, p.35, 2007.

A. Benoit, Y. Robert, and F. Vivien, A Guide to Algorithm Design: Paradigms, Methods, and Complexity Analysis, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00908448

M. Benzi and B. Uçar, Preconditioning techniques based on the Birkhoff-von Neumann decomposition, Computational Methods in Applied Mathematics, vol.17, pp.201-215, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01318486

P. Berman and M. Karpinski, Improved approximation lower bounds on small occurence optimization, 2003.

G. Birkhoff, Tres observaciones sobre el algebra lineal, Nacional de Tucumán, Series A, vol.5, pp.147-154, 1946.

M. Birn, V. Osipov, P. Sanders, C. Schulz, and N. Sitchinava, Efficient parallel and external matching, 2013.

, Parallel Processing, pp.659-670, 2013.

H. Rob, W. Bisseling, and . Meesen, Communication balancing in parallel sparse matrix-vector multiplication, Electronic Transactions on Numerical Analysis, vol.21, pp.47-65, 2005.

Z. Blanco, B. Liu, and M. M. Dehnavi, CSTF: Large-scale sparse tensor factorizations on distributed platforms, Proceedings of the 47th International Conference on Parallel Processing, vol.21, pp.1-21, 2018.

G. E. Blelloch, J. T. Fineman, and J. Shun, Greedy sequential maximal independent set and matching are parallel on average, Proceedings of the Twenty-fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp.308-317, 2012.

N. Blum, A new approach to maximum matching in general graphs, Proceedings of the 17th International Colloquium on Automata, Languages and Programming, pp.586-597, 1990.

R. A. Brualdi, The diagonal hypergraph of a matrix (bipartite graph), Discrete Mathematics, vol.27, issue.2, pp.127-147, 1979.

R. A. Brualdi, Notes on the Birkhoff algorithm for doubly stochastic matrices, Canadian Mathematical Bulletin, vol.25, issue.2, pp.191-199, 1982.

A. Richard, P. M. Brualdi, and . Gibson, Convex polyhedra of doubly stochastic matrices: I. Applications of the permanent function, Journal of Combinatorial Theory, Series A, vol.22, issue.2, pp.194-230, 1977.

A. Richard, H. J. Brualdi, and . Ryser, Combinatorial Matrix Theory, of Encyclopedia of Mathematics and its Applications, vol.39, 1991.

C. Thang-nguyen-bui and . Jones, A heuristic for reducing fill-in in sparse matrix factorization, Proc. 6th SIAM Conf. Parallel Processing for Scientific Computing, pp.445-452, 1993.

J. Peter and . Cameron, Notes on combinatorics, 2013.

A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka et al., Toward an architecture for never-ending language learning, AAAI, vol.5, p.3, 2010.

J. , D. Carroll, and J. Chang, Analysis of individual differences in multidimensional scaling via an N-way generalization of "Eckart-Young" decomposition, Psychometrika, vol.35, issue.3, pp.283-319, 1970.

V. Ümit and C. Aykanat, Hypergraph-partitioningbased decomposition for parallel sparse-matrix vector multiplication, IEEE Transactions on Parallel and Distributed Systems, vol.10, issue.7, pp.673-693, 1999.

V. Ümit and C. Aykanat, PaToH: A Multilevel Hypergraph Partitioning Tool, Version 3.0, 1999.

V. Ümit, K. Kaya, and B. Uçar, On shared-memory parallelization of a sparse matrix scaling algorithm, 2012 41st International Conference on Parallel Processing, pp.68-77, 2012.

V. Ümit, M. Deveci, K. Kaya, and B. Uçar, Multithreaded clustering for multi-level hypergraph partitioning, 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.848-859, 2012.

C. Chang, W. Chen, and H. Huang, On service guarantees for input-buffered crossbar switches: A capacity decomposition approach by Birkhoff and von Neumann, Quality of Service, 1999. IWQoS '99, pp.79-86, 1999.

J. Chen, Y. Liu, S. Lu, O. Barry, I. Sullivan et al., A fixed-parameter algorithm for the directed feedback vertex set problem, Journal of the ACM, vol.55, issue.5, pp.1-21, 2008.

J. Cheriyan, RandomizedÕ(M (|V |)) algorithms for problems in matching theory, SIAM Journal on Computing, vol.26, issue.6, pp.1635-1655, 1997.

J. Choi, X. Liu, S. Smith, and T. Simon, Blocking optimization techniques for sparse tensor computation, IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.568-577, 2018.

J. Choi and S. V. Vishwanathan, DFacTo: Distributed factorization of tensors, 27th Advances in Neural Information Processing Systems, pp.1296-1304

E. Chow and A. Patel, Fine-grained parallel incomplete LU factorization, SIAM Journal on Scientific Computing, vol.37, issue.2, pp.169-193, 2015.

A. Cichocki, D. P. Mandic, L. De-lathauwer, G. Zhou, Q. Zhao et al., Tensor decompositions for signal processing applications: From two-way to multiway component analysis, IEEE Signal Processing Magazine, vol.32, issue.2, pp.145-163, 2015.

E. Cohen, Structure prediction and computation of sparse matrix products, Journal of Combinatorial Optimization, vol.2, issue.4, pp.307-332, 1998.

F. Thomas, W. Coleman, and . Xu, Parallelism in structured Newton computations, Parallel Computing: Architectures, Algorithms and Applications, pp.295-302, 2007.

F. Thomas, W. Coleman, and . Xu, Fast (structured) Newton computations, SIAM Journal on Scientific Computing, vol.31, issue.2, pp.1175-1191, 2009.

F. Thomas, W. Coleman, and . Xu, Automatic Differentiation in MAT-LAB using ADMAT with Applications. SIAM, 2016.

J. Cong, Z. Li, and R. Bagrodia, Acyclic multi-way partitioning of Boolean networks, Proceedings of the 31st Annual Design Automation Conference, DAC'94, pp.670-675, 1994.

H. Elizabeth, J. Cuthill, and . Mckee, Reducing the bandwidth of sparse symmetric matrices, Proceedings of the 24th national conference, pp.157-172, 1969.

M. Cygan, Improved approximation for 3-dimensional matching via bounded pathwidth local search, Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pp.509-518, 2013.

M. Cygan, F. Grandoni, and M. Mastrolilli, How to sell hyperedges: The hypermatching assignment problem, Proc. of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, pp.342-351, 2013.

J. Czyzyk, M. P. Mesnier, and J. J. Moré, The NEOS Server, IEEE Journal on Computational Science and Engineering, vol.5, issue.3, pp.68-75, 1998.

T. A. Davis and Y. Hu, The University of Florida sparse matrix collection, ACM Transactions on Mathematical Software, vol.38, issue.1, 2011.

P. Lieven-de-lathauwer and . Comon, Bart De Moor, and Joos Vandewalle. Higher-order power method-Application in independent component analysis, Proceedings NOLTA'95, pp.91-96, 1995.

B. D. Lieven-de-lathauwer, J. Moor, and . Vandewalle, R N ) approximation of higherorder tensors, SIAM Journal on Matrix Analysis and Applications, vol.21, issue.2, pp.1324-1342, 2000.

D. De and W. , Variations on the Theorem of Birkhoff-von Neumann and extensions, Graphs and Combinatorics, vol.19, issue.2, pp.263-278, 2003.

J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. Liu, A supernodal approach to sparse partial pivoting, SIAM Journal on Matrix Analysis and Applications, vol.20, issue.3, pp.720-755, 1999.

J. W. Demmel, J. R. Gilbert, and X. S. Li, An asynchronous parallel supernodal algorithm for sparse gaussian elimination, SIAM Journal on Matrix Analysis and Applications, vol.20, issue.4, pp.915-952, 1999.

M. Deveci, K. Kaya, V. Ümit, and B. Uçar, A push-relabel-based maximum cardinality matching algorithm on GPUs, 42nd International Conference on Parallel Processing, pp.21-29, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00923464

M. Deveci, K. Kaya, B. Uçar, and V. , GPU accelerated maximum cardinality matching algorithms for bipartite graphs, Euro-Par 2013 Parallel Processing, vol.8097, pp.850-861, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00923449

P. Devlin and J. Kahn, Perfect fractional matchings in k-out hypergraphs, 2017.

E. D. Dolan, The NEOS Server 4.0 administrative guide, 2001.

E. D. Dolan and J. J. Moré, Benchmarking optimization software with performance profiles, Mathematical Programming, vol.91, issue.2, pp.201-213, 2002.

J. J. Dongarra, F. G. Gustavson, and A. H. Karp, Implementing linear algebra algorithms for dense matrices on a vector pipeline machine, SIAM Review, vol.26, issue.1, pp.91-112, 1984.

I. S. Duff, K. Kaya, and B. Uçar, Design, implementation, and analysis of maximum transversal algorithms, ACM Transactions on Mathematical Software, vol.38, issue.2, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00786548

I. S. Duff and J. Koster, On algorithms for permuting large entries to the diagonal of a sparse matrix, SIAM Journal on Matrix Analysis and Applications, vol.22, pp.973-996, 2001.

F. Dufossé, K. Kaya, I. Panagiotas, and B. Uçar, Approximation algorithms for maximum matchings in undirected graphs, Proceedings of the Seventh SIAM Workshop on Combinatorial Scientific Computing, pp.56-65, 2018.

F. Dufossé, K. Kaya, I. Panagiotas, and B. Uçar, Further notes on Birkhoff-von Neumann decomposition of doubly stochastic matrices, Linear Algebra and its Applications, vol.554, pp.68-78, 2018.

F. Dufossé, K. Kaya, I. Panagiotas, and B. Uçar, Effective heuristics for matchings in hypergraphs, 2018.

F. Dufossé, K. Kaya, and B. Uçar, Two approximation algorithms for bipartite matching on multicore architectures, IPDPS 2014 Selected Papers on Numerical and Combinatorial Algorithms, vol.85, pp.62-78, 2015.

A. L. Dulmage and N. Saul-mendelsohn, Coverings of bipartite graphs, Canadian Journal of Mathematics, vol.10, pp.517-534, 1958.

M. E. Dyer and A. M. Frieze, Randomized greedy matching, Random Structures & Algorithms, vol.2, issue.1, pp.29-45, 1991.

, Paths, trees, and flowers, Canadian Journal of Mathematics, vol.17, pp.449-467, 1965.

L. C. Eggan, Transition graphs and the star-height of regular events, The Michigan Mathematical Journal, vol.10, issue.4, pp.385-397, 1963.

C. Stanley, J. W. Eisenstat, and . Liu, The theory of elimination trees for sparse unsymmetric matrices, SIAM Journal on Matrix Analysis and Applications, vol.26, issue.3, pp.686-705, 2005.

C. Stanley, J. W. Eisenstat, and . Liu, A tree-based dataflow model for the unsymmetric multifrontal method, Electronic Transactions on Numerical Analysis, vol.21, pp.1-19, 2005.

C. Stanley, J. W. Eisenstat, and . Liu, Algorithmic aspects of elimination trees for sparse unsymmetric matrices, SIAM Journal on Matrix Analysis and Applications, vol.29, issue.4, pp.1363-1381, 2008.

V. Elango, F. Rastello, L. Pouchet, J. Ramanujam, and P. Sadayappan, On characterizing the data access complexity of programs, ACM SIGPLAN Notices -POPL, vol.15, issue.1, pp.567-580, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01104556

P. Erdös and A. Rényi, On random matrices, Publications of the Mathematical Institute of the Hungarian Academy of Sciences, vol.8, issue.3, pp.455-461, 1964.

F. Bastiaan-onne, R. H. Auer, and . Bisseling, A GPU algorithm for greedy graph matching, Facing the Multicore Challenge, vol.II, pp.108-119, 2012.

N. Fauzia, V. Elango, M. Ravishankar, J. Ramanujam, F. Rastello et al., Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential, ACM Transactions on Architecture and Code Optimization, vol.10, issue.4, p.29, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00920031

M. Charles, R. M. Fiduccia, and . Mattheyses, A linear-time heuristic for improving network partitions, 19th Design Automation Conference, pp.175-181, 1982.

J. Franklin and J. Lorenz, On the scaling of multidimensional matrices, Linear Algebra and its Applications, vol.114, pp.717-735, 1989.

A. M. Frieze, Maximum matchings in a class of random graphs, Journal of Combinatorial Theory, Series B, vol.40, issue.2, pp.196-212, 1986.

A. Froger, O. Guyon, and E. Pinson, A set packing approach for scheduling passenger train drivers: the French experience, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01138067

N. Harold, R. E. Gabow, and . Tarjan, Faster scaling algorithms for network problems, SIAM Journal on Computing, vol.18, issue.5, pp.1013-1036, 1989.

R. Michael, D. S. Garey, and . Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, 1979.

A. George, Nested dissection of a regular finite element mesh, SIAM Journal on Numerical Analysis, vol.10, issue.2, pp.345-363, 1973.

A. J. George, Computer implementation of the finite element method, 1971.

V. Andrew, R. Goldberg, and . Kennedy, Global price updates help, SIAM Journal on Discrete Mathematics, vol.10, issue.4, pp.551-572, 1997.

O. Görlitz, S. Sizov, and S. Staab, Pints: Peer-to-peer infrastructure for tagging systems, Proceedings of the 7th International Conference on Peer-to-peer Systems, IPTPS'08, vol.19, 2008.

G. Gottlob and G. Greco, Decomposing combinatorial auctions and set packing problems, Journal of the ACM, vol.60, issue.4, 2013.

L. Grasedyck, Hierarchical singular value decomposition of tensors, SIAM Journal on Matrix Analysis and Applications, vol.31, issue.4, pp.2029-2054, 2010.

W. Gropp and J. J. Moré, Optimization environments and the NEOS Server, Approximation Theory and Optimization, pp.167-182, 1997.

H. Gruber, Digraph complexity measures and applications in formal language theory, Discrete Mathematics & Theoretical Computer Science, vol.14, issue.2, pp.189-204, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00990597

G. Guennebaud and B. Jacob, Eigen v3, 2010.

A. Guermouche, J. Excellent, and G. Utard, Impact of reordering on the memory of a multifrontal solver, Parallel Computing, vol.29, issue.9, pp.1191-1218, 2003.
URL : https://hal.archives-ouvertes.fr/hal-00807378

A. Gupta, Improved symbolic and numerical factorization algorithms for unsymmetric sparse matrices, SIAM Journal on Matrix Analysis and Applications, vol.24, issue.2, pp.529-552, 2002.

M. Halappanavar, J. Feo, O. Villa, A. Tumeo, and A. Pothen, Approximate weighted matching on emerging manycore and multithreaded architectures, International Journal of High Performance Computing Applications, vol.26, issue.4, pp.413-430, 2012.

M. Magnús and . Halldórsson, Approximating discrete collections via local improvements, The Sixth annual ACM-SIAM symposium on Discrete Algorithms (SODA), vol.95, pp.160-169

R. A. Harshman, Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-modal factor analysis. UCLA Working Papers in Phonetics, vol.16, pp.1-84, 1970.

J. A. Nicholas and . Harvey, Algebraic algorithms for matching and matroid problems, SIAM Journal on Computing, vol.39, issue.2, pp.679-702, 2009.

E. Hazan, S. Safra, and O. Schwartz, On the complexity of approximating k-dimensional matching, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pp.83-97, 2003.

E. Hazan, S. Safra, and O. Schwartz, On the complexity of approximating k-set packing, Computational Complexity, vol.15, issue.1, pp.20-39, 2006.

B. Hendrickson, Load balancing fictions, falsehoods and fallacies, Applied Mathematical Modelling, vol.25, pp.99-108, 2000.

B. Hendrickson and T. G. Kolda, Graph partitioning models for parallel computing, Parallel Computing, vol.26, issue.12, pp.1519-1534, 2000.

B. Hendrickson and R. Leland, The Chaco user's guide, version 1.0, 1993.

J. Herrmann, J. Kho, B. Uçar, K. Kaya, and U. V. , Acyclic partitioning of large directed acyclic graphs, Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID, pp.371-380, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01672010

J. Herrmann, M. Yusufözkaya, B. Uçar, K. Kaya, and U. V. , Multilevel algorithms for acyclic partitioning of directed acyclic graphs, SIAM Journal on Scientific Computing, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02306566

J. D. Hogg, J. K. Reid, and J. A. Scott, Design of a multicore sparse Cholesky factorization using DAGs, SIAM Journal on Scientific Computing, vol.32, issue.6, pp.3627-3649, 2010.

J. E. Hopcroft and R. M. Karp, An n 5/2 algorithm for maximum matchings in bipartite graphs, SIAM Journal on Computing, vol.2, issue.4, pp.225-231, 1973.

A. Roger, C. R. Horn, and . Johnson, Matrix Analysis, 1985.

A. J. Cor, A. Hurkens, and . Schrijver, On the size of systems of sets every t of which have an SDR, with an application to the worst-case ratio of heuristics for packing problems, SIAM Journal on Discrete Mathematics, vol.2, issue.1, pp.68-72, 1989.

H. Oscar, S. Ibarra, and . Moran, Deterministic and probabilistic algorithms for maximum bipartite matching via fast matrix multiplication, Information Processing Letters, pp.12-15, 1981.

I. Jeon, E. E. Papalexakis, C. Kang, and . Faloutsos, Haten2: Billion-scale tensor decompositions, IEEE 31st International Conference on Data Engineering (ICDE), pp.1047-1058, 2015.

A. G. Jochen, Jess and Hendrikus Gerardus Maria Kees. A data structure for parallel L/U decomposition, IEEE Transactions on Computers, vol.31, issue.3, pp.231-239, 1982.

U. Kang, E. Papalexakis, A. Harpale, and C. F. Gigatensor, Scaling tensor analysis up by 100 times -Algorithms and discoveries, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pp.316-324, 2012.

E. Micha-l-karo?ski, B. Overman, and . Pittel, On a perfect matching in a random bipartite digraph with average out-degree below two, 2019.

B. Micha-l-karo?ski and . Pittel, Existence of a perfect matching in a random (1 + e ?1 )-out bipartite graph, Journal of Combinatorial Theory, Series B, vol.88, pp.1-16, 2003.

M. Richard, E. Karp-;-raymond, J. W. Miller, J. D. Thatcher, and . Bohlinger, Reducibility among combinatorial problems, Complexity of Computer Computations, The IBM Research Symposiax, pp.85-103, 1972.

R. M. Karp, A. H. Rinnooy-kan, and R. V. Vohra, Average case analysis of a heuristic for the assignment problem, Mathematics of Operations Research, vol.19, pp.513-522, 1994.

M. Richard, M. Karp, and . Sipser, Maximum matching in sparse random graphs, 22nd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp.364-375, 1981.

R. M. Karp, U. V. Vazirani, and V. V. Vazirani, An optimal algorithm for on-line bipartite matching, 22nd annual ACM symposium on Theory of computing (STOC), pp.352-358, 1990.

G. Karypis and V. Kumar, MeTiS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices Version 4.0. University of Minnesota, Department of Comp. Sci. and Eng, 1998.

G. Karypis and V. Kumar, Multilevel algorithms for multiconstraint hypergraph partitioning, 1998.

K. Kaya, J. Langguth, F. Manne, and B. Uçar, Push-relabel based algorithms for the maximum transversal problem, Computers and Operations Research, vol.40, issue.5, pp.1266-1275, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00763920

K. Kaya and B. Uçar, Constructing elimination trees for sparse unsymmetric matrices, SIAM Journal on Matrix Analysis and Applications, vol.34, issue.2, pp.345-354, 2013.
URL : https://hal.archives-ouvertes.fr/inria-00567970

O. Kaya and B. Uçar, Scalable sparse tensor decompositions in distributed memory systems, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '15, vol.77, pp.1-77, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01148202

O. Kaya and B. Uçar, Parallel Candecomp/Parafac decomposition of sparse tensors using dimension trees, SIAM Journal on Scientific Computing, vol.40, issue.1, pp.99-130, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01397464

E. Kayaaslan and B. Uçar, Reducing elimination tree height for parallel LU factorization of sparse unsymmetric matrices, 21st International Conference on High Performance Computing (HiPC 2014), pp.1-10, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01114413

B. W. Kernighan, Optimal sequential partitions of graphs, Journal of the ACM, vol.18, issue.1, pp.34-40, 1971.

S. Kintali, N. Kothari, and A. Kumar, Approximation algorithms for directed width parameters. CoRR, abs/1107, vol.4824, 2011.

P. A. Knight, The Sinkhorn-Knopp algorithm: Convergence and applications, SIAM Journal on Matrix Analysis and Applications, vol.30, issue.1, pp.261-275, 2008.

A. Philip, D. Knight, and . Ruiz, A fast algorithm for matrix balancing, IMA Journal of Numerical Analysis, vol.33, issue.3, pp.1029-1047, 2013.

P. A. Knight, D. Ruiz, and B. Uçar, A symmetry preserving algorithm for matrix scaling, SIAM Journal on Matrix Analysis and Applications, vol.35, issue.3, pp.931-955, 2014.
URL : https://hal.archives-ouvertes.fr/inria-00569250

T. G. Kolda and B. Bader, Tensor decompositions and applications, SIAM Review, vol.51, issue.3, pp.455-500, 2009.

C. Tjalling, M. Koopmans, and . Beckmann, Assignment problems and the location of economic activities, Econometrica, vol.25, issue.1, pp.53-76, 1957.

R. B. Mads, . Kristensen, A. F. Simon, T. Lund, J. Blum et al., Fusion of parallel array operations, Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, pp.71-85, 2016.

R. B. Mads, . Kristensen, A. F. Simon, T. Lund, K. Blum et al., Bohrium: A virtual machine approach to portable parallelism, Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.312-321, 2014.

J. Kulkarni, E. Lee, and M. Singh, Minimum Birkhoff-von Neumann decomposition, Proc. 19th International Conference Integer Programming and Combinatorial Optimization (IPCO 2017), pp.343-354, 2017.

S. Lamm, P. Sanders, C. Schulz, D. Strash, and R. F. Werneck, Finding Near-Optimal Independent Sets at Scale, Proceedings of the 18th Workshop on Algorithm Engineering and Experiments (ALENEX'16), pp.138-150

J. Langguth, F. Manne, and P. Sanders, Heuristic initialization for bipartite matching problems, Journal of Experimental Algorithmics, vol.15, pp.1-1, 2010.

(. Yanzhe, S. Murray)-lei, J. Jasin, A. Uichanco, and . Vakhutinsky, Randomized product display (ranking), pricing, and order fulfillment for e-commerce retailers, SSRN Electronic Journal, 2018.

T. Lengauer, Combinatorial Algorithms for Integrated Circuit Layout, 1990.

R. Lepère and D. Trystram, A new clustering algorithm for large communication delays, 16th International Parallel and Distributed Processing Symposium (IPDPS), page (CDROM), 2002.

H. Levy and D. W. Low, A contraction algorithm for finding small cycle cutsets, Journal of Algorithms, vol.9, issue.4, pp.470-493, 1988.

J. Li, J. Choi, I. Perros, J. Sun, and R. Vuduc, Model-driven sparse CP decomposition for higher-order tensors, 31th IEEE International Symposium on Parallel and Distributed Processing, pp.1048-1057, 2017.

J. Li, Y. Ma, and R. Vuduc, ParTI!: A Parallel Tensor Infrastructure for Multicore CPU and GPUs (Version 0.1.0), 2016.

J. Li, J. Sun, and R. Vuduc, HiCOO: Hierarchical storage of sparse tensors, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC'18, 2018.

J. Li, B. Uçar, V. Ümit, J. Sun, K. Barker et al., Efficient and effective sparse tensor reordering, Proceedings of the ACM International Conference on Supercomputing, ICS '19, pp.227-237, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02306569

S. Xiaoye, J. W. Li, and . Demmel, Making sparse Gaussian elimination scalable by static pivoting, Proc. of the 1998 ACM/IEEE Conference on Supercomputing, SC '98, pp.1-17, 1998.

P. Athanasios, N. D. Liavas, and . Sidiropoulos, Parallel algorithms for constrained tensor factorization via alternating direction method of multipliers, IEEE Transactions on Signal Processing, vol.63, issue.20, pp.5450-5463, 2015.

B. Liu, C. Wen, A. D. Sarwate, and M. M. Dehnavi, A unified optimization approach for sparse tensor operations on GPUs, 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp.47-57, 2017.

W. H. Joseph and . Liu, Computational models and task scheduling for parallel sparse Cholesky factorization, Parallel Computing, vol.3, issue.4, pp.327-342, 1986.

W. H. Joseph and . Liu, A graph partitioning algorithm by node separators, ACM Transactions on Mathematical Software, vol.15, issue.3, pp.198-219, 1989.

W. H. Joseph and . Liu, Reordering sparse matrices for parallel elimination, Parallel Computing, vol.11, issue.1, pp.73-91, 1989.

L. Liu, (. Jun, ). Jim, L. Xu, and . Fortnow, Quantized BvND: A better solution for optical and hybrid switching in data center networks, 2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing, pp.237-246, 2018.

Z. Lotker, B. Patt-shamir, and S. Pettie, Improved distributed approximate matching, Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures (SPAA), pp.129-136, 2008.

L. Lovász, On determinants, matchings, and random algorithms, FCT, pp.565-574, 1979.

L. Lovász and M. D. Plummer, Matching Theory, 1986.

A. Lubiw, Doubly lexical orderings of matrices, SIAM Journal on Computing, vol.16, issue.5, pp.854-879, 1987.

Y. Ma, J. Li, X. Wu, C. Yan, J. Sun et al., Optimizing sparse tensor times matrix on GPUs, Journal of Parallel and Distributed Computing, vol.129, pp.99-109, 2019.

J. Magun, Greedy matching algorithms, an experimental study, Journal of Experimental Algorithmics, vol.3, issue.6, 1998.

M. Marcus and R. Ree, Diagonals of doubly stochastic matrices, The Quarterly Journal of Mathematics, vol.10, issue.1, pp.296-302, 1959.

N. Mckeown, The iSLIP scheduling algorithm for input-queued switches, EEE/ACM Transactions on Networking, vol.7, issue.2, pp.188-201, 1999.

A. Meir and J. W. Moon, The expected node-independence number of random trees, Indagationes Mathematicae, vol.76, pp.335-341, 1973.

J. Mellor-crummey, D. Whaley, and K. Kennedy, Improving memory hierarchy performance for irregular applications using data and computation reorderings, International Journal of Parallel Programming, vol.29, issue.3, pp.217-247, 2001.

S. Micali and V. V. Vazirani, An O( |V ||E|) algorithm for finding maximum matching in general graphs, 21st Annual Symposium on Foundations of Computer Science, pp.17-27, 1980.

O. Moreira, M. Popp, and C. Schulz, Graph partitioning with acyclicity constraints, 16th International Symposium on Experimental Algorithms, SEA, 2017.

O. Moreira, M. Popp, and C. Schulz, Evolutionary multi-level acyclic graph partitioning, Proceedings of the Genetic and Evolutionary Computation Conference, GECCO, pp.332-339, 2018.

M. Mucha and P. Sankowski, Maximum matchings in planar graphs via Gaussian elimination, pp.532-543, 2004.

M. Mucha and P. Sankowski, Maximum matchings via Gaussian elimination, FOCS '04, pp.248-255, 2004.

H. Nassar, G. Kollias, A. Grama, and D. F. Gleich, Low rank methods for multiple network alignment, 2018.

J. Ne?et?il and P. Mendez, Tree-depth, subgraph coloring and homomorphism bounds, European Journal of Combinatorics, vol.27, issue.6, pp.1022-1041, 2006.

M. Yusufözkaya, A. Benoit, B. Uçar, J. Herrmann, and U. V. , A scalable clustering-based task scheduler for homogeneous processors using DAG partitioning, 33rd IEEE International Parallel and Distributed Processing Symposium, pp.155-165, 2019.

R. Paige and R. E. Tarjan, Three partition refinement algorithms, SIAM Journal on Computing, vol.16, issue.6, pp.973-989, 1987.

H. Chiristos, K. Papadimitriou, and . Steiglitz, Combinatorial Optimization: Algorithms and Complexity, Corrected, unabridged reprint of Combinatorial Optimization: Algorithms and Complexity, 1982.

M. Panos, T. Pardalos, M. G. Qian, and . Resende, A greedy randomized adaptive search procedure for the feedback vertex set problem, Journal of Combinatorial Optimization, vol.2, issue.4, pp.399-412, 1998.

M. Patwary, R. H. Bisseling, and F. Manne, Parallel greedy graph matching using an edge partitioning approach, Proceedings of the Fourth International Workshop on High-level Parallel Programming and Applications, HLPP '10, pp.45-54, 2010.

F. Pellegrini, SCOTCH 5.1 User's Guide. Laboratoire Bordelais de Recherche en Informatique (LaBRI), 2008.
URL : https://hal.archives-ouvertes.fr/hal-00410332

M. Daniel, R. H. Pelt, and . Bisseling, A medium-grain method for fast 2D bipartitioning of sparse matrices, IPDPS2014, IEEE 28th International Parallel and Distributed Processing Symposium, pp.529-539, 2014.

I. Perros, E. E. Papalexakis, F. Wang, R. Vuduc, E. Searles et al., SPARTan: Scalable PARAFAC2 for large and sparse data, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '17, pp.375-384, 2017.

A. Phan, P. Tichavský, and A. Cichocki, Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations, IEEE Transactions on Signal Processing, vol.61, issue.19, pp.4834-4846, 2013.

J. C. Pichel, F. F. Rivera, M. Fernández, and A. Rodríguez, Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs, Microprocessors and Microsystems, vol.36, issue.2, pp.65-77, 2012.

M. Poloczek and M. Szegedy, Randomized greedy algorithms for the maximum matching problem with new analysis, IEEE 53rd Annual Sym. on Foundations of Computer Science (FOCS), pp.708-717, 2012.

A. Pothen, Sparse null bases and marriage theorems, 1984.

A. Pothen, The complexity of optimal elimination trees, 1988.

A. Pothen and C. Fan, Computing the block triangular form of a sparse matrix, ACM Transactions on Mathematical Software, vol.16, issue.4, pp.303-324, 1990.

A. Pothen, S. M. Ferdous, and F. Manne, Approximation algorithms in combinatorial scientific computing, Acta Numerica, vol.28, pp.541-633, 2019.

. Louis-noël-pouchet and . Polybench, The polyhedral benchmark suite, 2012.

O. Michael, V. V. Rabin, and . Vazirani, Maximum matchings in general graphs through randomization, Journal of Algorithms, vol.10, issue.4, pp.557-567, 1989.

D. Ruiz, A scaling algorithm to equilibrate both row and column norms in matrices, 2001.

P. Sanders and C. Schulz, Engineering multilevel graph partitioning algorithms, Camil Demetrescu and Magnús M

. Halldórsson, Algorithms -ESA 2011: 19th Annual European Symposium, pp.469-480, 2011.

R. Schreiber, A new implementation of sparse Gaussian elimination, ACM Transactions on Mathematical Software, vol.8, issue.3, pp.256-276, 1982.

R. Oguz-selvitopi, M. Mustafa-ozdal, and C. Aykanat, A novel method for scaling iterative solvers: Avoiding latency overhead of parallel sparse-matrix vector multiplies, IEEE Transactions on Parallel and Distributed Systems, vol.26, issue.3, pp.632-645, 2015.

R. Sinkhorn and P. Knopp, Concerning nonnegative matrices and doubly stochastic matrices, Pacific Journal of Mathematics, vol.21, pp.343-348, 1967.

S. Smith, W. Jee, J. Choi, R. Li, J. Vuduc et al., FROSTT: The formidable repository of open sparse tensors and tools, 2017.

S. Smith and G. Karypis, A medium-grained algorithm for sparse tensor factorization, 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, pp.902-911, 2016.

S. Smith and G. Karypis, SPLATT: The Surprisingly Par-alleL spArse Tensor Toolkit (Version 1.1.1), 2016.

S. Smith, N. Ravindran, N. D. Sidiropoulos, and G. Karypis, SPLATT: Efficient and parallel sparse tensor-matrix multiplication, 29th IEEE International Parallel & Distributed Processing Symposium, pp.61-70, 2015.

R. E. Tarjan and M. Yannakakis, Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs, SIAM Journal on Computing, vol.13, issue.3, pp.566-579, 1984.

B. Uçar and C. Aykanat, Encapsulating multiple communication-cost metrics in partitioning sparse rectangular matrices for parallel matrix-vector multiplies, SIAM Journal on Scientific Computing, vol.25, issue.6, pp.1837-1859, 2004.

B. Uçar and C. Aykanat, Revisiting hypergraph models for sparse matrix partitioning, SIAM Review, vol.49, issue.4, pp.595-603, 2007.

V. V. Vazirani, An improved definition of blossoms and a simpler proof of the MV matching algorithm. CoRR, abs/1210, vol.4594, 2012.

D. W. Walkup, Matchings in random regular bipartite digraphs. Discrete Mathematics, vol.31, pp.59-64, 1980.

C. Walshaw, Multilevel refinement for combinatorial optimisation problems, Annals of Operations Research, vol.131, issue.1, pp.325-372, 2004.

S. H. Eric, E. F. Wong, W. Young, and . Mak, Clustering based acyclic multi-way partitioning, Proceedings of the 13th ACM Great Lakes Symposium on VLSI, GLSVLSI '03, pp.203-206, 2003.

T. Yang and A. Gerasoulis, DSC: scheduling parallel tasks on an unbounded number of processors, IEEE Transactions on Parallel and Distributed Systems, vol.5, issue.9, pp.951-967, 1994.

A. Nicholas-yzelman and D. Roose, High-level strategies for parallel shared-memory sparse matrix-vector multiplication, IEEE Transactions on Parallel and Distributed Systems, vol.25, issue.1, pp.116-125, 2014.