Fast Nondeterministic Matrix Multiplication via Derandomization of Freivalds’ Algorithm

. We design two nondeterministic algorithms for matrix multiplication. Both algorithms are based on derandomization of Freivalds’ algorithm for veriﬁcation of matrix products. The ﬁrst algorithm works with real numbers and its time complexity on Real RAMs is O ( n 2 log n ) . The second one is of the same complexity, works with integer matrices on a unit cost RAM with numbers whose size is proportional to the size of the largest entry in the underlying matrices. Our algorithms bring new ideas into the design of matrix multiplication algorithms and open new avenues for their further development. The results pose exciting questions concerning the relation of the complexity of deterministic versus nondeterministic algorithms for matrix multiplication, and complexity of integer versus real matrices multiplication.


Introduction
Matrix multiplication probably is one of the most closely watched areas of algorithm design. A great attention is paid even to minor advancements in this field. Undoubtedly, this is because of the immense practical importance of the problem backed by its deep algorithmic beauty. Matrix multiplication has attracted considerable attention for more than four decades and the challenge is whether or not matrix multiplication can be done in quadratic time. The origin of this conjecture lies back in the late nineteen sixties when Volker Strassen discovered an algorithm for matrix multiplication of complexity O(n 2.807 ) [23] (in terms of the number of arithmetical operations). This has been an unexpected and significant improvement over the classical algorithm of complexity O(n 3 ). Since then the search for matrix multiplication algorithms of complexity O(n 2+ω ), for 0 ≤ ω < 1, has started and over the years it has proceeded through several incremental improvements of the ω exponent. The best current upper bound achieves the algorithm by Vasilevska Williams from 2011 with ω = 0.3729.
The increased interest in matrix multiplication algorithms came about in the same time as the increased interest in randomized algorithms. Randomization has become a subject of systematic investigation as a "cheap" tool for solution of problems for which no efficient deterministic algorithms have been known. Could randomization also help in matrix multiplication?
Perhaps this was the motivation of Rusins Freivalds who in 1977 designed a randomized algorithm with a bounded error probability for verifying matrix multiplication in quadratic randomized time [10], [11]. This algorithm has become one of the first algorithms showing the advantage of randomized algorithm over the deterministic ones also in a domain related to matrix multiplication and no wonder that Freivalds' algorithm has become a standard textbook example illustrating the power of randomized computations over the deterministic ones (cf. [7], [17]). At the same time this result has indicated that for a task related to matrix multiplication, viz. its verification, a quadratic time bound can be achieved.
The idea of Freivalds' algorithm is as follows. In order to verify whether AB = C for any three matrices A, B and C of real numbers of size n × n, Freivalds' algorithm chooses a specific (column) vector x of length n and compares the product ABx with the product Cx. Both products can be computed using O(n 2 ) arithmetical operations (the former product thanks to the associativity of the matrix products: (AB)x = A(Bx)). The entries of vector x are uniformly chosen from the set {0, 1}. It can be shown that if AB = C, then the Freivalds algorithm returns a wrong answer with the probability at most 1/2 (cf. [7]). The probability of error can be reduced to 1/2 k by performing k independent iterations of the algorithm.
Similarly as in the case of matrix multiplication algorithms research effort aiming at the performance improvement of Freivalds' algorithm has followed. The goal was to diminish the amount of randomness in this algorithm (cf. [18], [14]). This has culminated by the recent result of Korec and Wiedermann [15] showing that integer matrix product verification can entirely be derandomized. For real matrices only one random real number is needed in order to verify the correctness of matrix product with probability one. Both verification algorithms run in quadratic time w.r.t. the number of arithmetic operations. The deterministic algorithm for integer matrix verification has straightforwardly lead to the design of a nondeterministic algorithm for computing the product of two integer matrices in quadratic time, answering thus positively the conjecture concerning the complexity of matrix product, albeit only for integer matrices and for non-deterministic algorithms.
In order to completely understand the development sketched above one also has to pay attention to the underlying models of computation. The standard computational model for matrix product is by arithmetic circuits (aka straight line programs) over some field F. The inputs to the circuit are the entries of the two matrices and the output from the circuit are the entries of the resulting matrix product. All entries are elements of F. In the circuit's gates arithmetic operations with the field elements are allowed.
For verification of matrix products a stronger computational model is used. It is a probabilistic model allowing random moves and in addition to the standard arithmetic operations also comparisons between the elements of the field. For the case of integer matrices an appropriate underlying model is probabilistic unit-cost RAM (cf [1]). When one wants to take into account the size of integers manipulated during a computation the log-cost RAM model should be used. For the case of real matrices, the so-called Real RAM (aka BSS machine [4]) model is used. Essentially, Real RAM is a Random Access Machine with registers that can store arbitrary real numbers and that can compute rational functions over reals at unit cost.
Finally, in [15] for computing the product of integer matrices in quadratic time a nondeterministic model has been used. Bringing nondeterminism into the matrix multiplication means a further shift along the scale of the computational power of the underlying computational devices. As we shall see later the departure from the standard models of arithmetical circuits towards more powerful models of computations is not a purposeless move serving for circumventing the existing obstacles by using more powerful tools. Rather, such a proceeding allows approaching the problem from an other end. It might help in the development of a theoretical background for the development of more practical methods.
The present paper fits into the line of research efforts sketched above. Namely, it investigates the benefits that can be gained from considering nondeterministic algorithms in the context of matrix multiplication. It is based on the previous paper [15] in which the framework for the present approach has been established. It solves some problems that have remained open in the previous paper. Progress along these lines has been enabled by using a new fast deterministic criterion suitable for verification of matrix products over infinite and finite fields. Specifically, the paper brings an efficient algorithm for the nondeterministic multiplications (or deterministic verification of products) of matrices with real entries. On a Real RAM this task can be done deterministically using O(n 2 log n) arithmetical operations. When compared to the probabilistic algorithm for the latter task from [15], the factor of ∼ log n seems to be a penalization for derandomization of the Freivalds' algorithm. Next, an algorithm of similar complexity is designed for multiplication of integer matrices using modular arithmetic. Its advantage over the quadratic-time algorithm for the same task designed in [15] is that the new algorithm does not compute with large numbers; the size of all numbers in this algorithm is bounded by the size of the largest entry in the underlying matrices.
The contribution of the present paper does not merely lie in the design of the previous algorithms which establish new upper bounds on the complexity of matrix multiplication on the non-deterministic models of computations. What is perhaps more important is that these algorithms make use of algebraic techniques that so far have not been often considered in the matrix multiplication research. Among other things these techniques include elements of the theory of Vandermonde matrices, properties of polynomial roots, multipoint polynomial evaluation, primality certificates, fast Fourier transform and number theoretic transform. It is hoped that the insight gained by the use of such techniques will help to understand better the nature of matrix multiplication algorithms and eventually bring a further progress to the underlying field.
Our results raise a number of interesting questions to which we presently do not know the answers. For instance, is multiplication of integer matrices easier than multiplication of real matrices, under the unit cost model? So far we do not know an algorithm for nondeterministic multiplication of real matrices of quadratic complexity, albeit such an algorithm for integer matrices in known. By the way, the existence of the latter algorithm implies that any proof for super-quadratic lower bound for integer matrix multiplication must be constructed so as not to hold for nondeterministic computations or must take into account the size of integers stored in the RAM registers. Nevertheless, perhaps the most important question arising from our research is the question, whether nondeterminism helps in matrix multiplication. A positive answer would be a surprise since it would put matrix multiplication among the problems that separate determinism from nondeterminism. In any case, our results seem to strengthen the hope that there exist substantially better deterministic algorithms for matrix multiplication than those known today.
The structure of the paper is as follows. In Section 2 those results from the predecessor paper [15] are briefly reviewed that will be used in the elaboration of the new results presented here. Especially, Lemma 1 is presented here that forms the starting point of all subsequent considerations, inclusively of a nondeterministic Algorithm 1 for multiplication of integer matrices in quadratic time on a unit cost RAM (with no bounds on the size of RAM registers). In Section 3 it is shown that ideas from Algorithm 1 cannot be straightforwardly used for multiplication of real matrices. To that end a corollary of Lemma 1 is proved serving as a new criterion enabling efficient verification of the product of real matrices. This results into the design of a nondeterministic Algorithm 2 for multiplication of real matrices on Real RAM in time O(n 2 log n). In Section 4 we return to the problem of multiplication of integer matrices using only registers of bounded-size. This is achieved by using modular arithmetic over a properly selected finite field. The resulting Algorithm 3 is of complexity O(n 2 log n) on a unit-cost RAM and computes with registers whose size is proportional to the size of the largest entry in the underlying integer matrices. In order to achieve their complexity as stated here both Algorithm 2 and Algorithm 3 make use of an appropriate version of the fast Fourier transform tuned to the underlying number field. Section 5 contains an overview of the results presented in this paper, open problems and conclusions.

Preliminaries
The key to all results in this paper is the following lemma from [15] which is related to the theory of Vandermonde matrices. Here, we recall this lemma inclusively of its proof because we will refer to it in our further considerations: In the resulting matrix-vector product there is one algebraic equation in indeterminate r of degree less than n corresponding to this non-zero row. If j is the index of this row then the respective equation is of form This equation has at most n − 1 real roots.
2 Based on the previous lemma a simple probabilistic algorithm of quadratic time complexity for verifying the product of two matrices has been designed in [15]. We sketch the respective algorithm since it will be the basis of our further improvements.
If Y = 0 then AB = C with probability 1 (because among all reals there are at most n − 1 "bad" numbers r causing (AB − C)x = 0 even if AB = C).
If Y = 0 then AB = C "for sure". Obviously, Y can be computed in O(n 2 ) operations if matrix-vector products are computed first.
The probabilistic step in this algorithm could be eliminated if we could deterministically choose r in such a way that (AB−C)x = 0 if and only if AB = C. That is, if D = AB−C = {d i,j, }, such an r cannot be a root of any polynomial of form P i (r) = n j=1 d i,j x j−1 = 0 for i = 1, 2, . . . , n. Such an r can be found using the so-called Cauchy's bound ( [9], p. 122, or the textbook [13], p. 82): One can see that in order to upper-bound the magnitude of the roots we have to know coefficient a k and the maximum of the absolute value of all coefficients in a polynomial. In our case, k = n in the previous theorem, d i,j = n j=1 a i,j b j,i − c i,j and P i (r) = n j=1 d i,j r j−1 = 0 for i = 1, 2, . . . , n.
Let c max = max{|a i,j |, |b i,j |, |c i,j |}. Then the maximal coefficient in any polynomial -the value of A -can be upper-bounded by nc 2 max + c max . Further, for integer matrices, and only for integer matrices, for any i, the absolute value of the leading coefficient in front of the highest power of r in P i (r) can be lower-bounded by 1 since the leading coefficient must be an integer.
Then, from Cauchy's bound it follows that for any i the absolute values of the roots of polynomial P i (r) with integer coefficients are upper-bounded by α = nc 2 max + c max + 1. Thus, choosing r ≥ α in the previous proposition will guarantee that P i (r) = 0 and hence Dx = 0 can only hold for D = 0. From these considerations a deterministic algorithm of quadratic complexity for integer matrix multiplication verification and nondeterministic algorithm for integer matrix multiplication follow easily (cf. [15]). Here we will only give the algorithm for nondeterministic matrix multiplication that will be the subject of our further improvements (cf. Algorithm 1). In [15] it is shown that Algorithm 1 can be generalized to the case of matrices of rational numbers given as numerator and denominator pairs.
Note that the seemingly similar idea of guessing C and its subsequent probabilistic verificationà la Freivalds cannot work since through the error margin of the respective algorithm, however small, matrices C can come through for which AB = C. Thus, getting rid of probabilistic features of a verification algorithm for matrix multiplication turns out to be a crucial ingredient for the success of our nondeterministic algorithm.

Non-deterministic multiplication of real matrices
Can we adjust Algorithm 1 to also work with real matrices? Unfortunately, no. What will not go through for reals is the upper bound on the magnitude of the maximal root for a class of polynomials with real coefficients depending on matrices A, B and C. Namely, although we can estimate the size of A in Cauchy's bound, we cannot get a lower bound on a k whose value can be arbitrary close to 0. This will make the value of A/a k arbitrarily large and subsequently we cannot get a bound on |x|. Thus, it appears that we cannot easily find number r such that AB = C if and only if ABx = Cx for x = (1, r, r 2 , . . . , r n−1 ) T .
Nevertheless, from Lemma 1 it follows that no matter how we pick n distinct real numbers r 1 , r 2 , . . . , r n , among them there will be at least one number r j that is not a root of the respective polynomial. Such an r j will certify AB = C if and only if ABx j = Cx j for x j = (1, r j , r 2 j , . . . , r n−1 j ) T . This suggests a possible strategy for a verification algorithm. For j = 1, 2, . . . , n we test whether ABx j = Cx j . If the equality will be confirmed for all j then AB = C. Otherwise AB = C.
Thanks to a special form of vectors x j it is promising to realize that the product Bx j for all j = 1, 2 . . . , n can be computed faster than in cubic time. Namely, we can reorganize the computation of Bx j for j = 1, 2, . . . , n in such a way that we compute the value of each polynomial defined by each row of B, at points x 1 , x 2 , . . . , x n .
Let P (i, x) = n j=1 b i,j x j−1 be the polynomial corresponding to the product of the i-th row of matrix B with vector x = (1, x, x 2 , . . . , x n−1 ) T . Then for each i we have to compute P (i, x) in n different points x = r 1 , r 2 , . . . , r n . This is a typical task of multipoint polynomial evaluation that for any set of points can be solved by divide-and-conquer method. This leads to complexity of O(n log 2 n) operations per one row of B (i.e., per one i) -cf. [1], [5], [6]. When the points are selected to be the roots of unity fast Fourier transform can be used (cf. [1]). This method saves one factor of log n in the resulting complexity estimate. In a similar way we can compute Cx i .
Unfortunately, for computing the next product A(Bx i ) the advantage of a specific form of vector Bx i is lost and this seems to lead again to an algorithm of cubic complexity. Fortunately, a slightly more complicated verification criterion based on the following corollary of Lemma 1 solves the problem: Proof: The left-to-right implication is obvious. For the proof of the reverse direction suppose that under the assumptions of the corollary D = 0. Then at least one element of D would be non-zero. In the vector-matrix-vector product x T i Dx i this element will give rise to a polynomial in indeterminate r i of degree at most 2n − 2. This polynomial has at most 2n − 2 real roots and therefore it cannot turn to zero in 2n − 1 distinct real numbers. This is a contradiction and therefore D = 0.
2 Translating back to our context of matrix multiplication the previous corollary implies a strategy for a verification algorithm. For i = 1, 2, . . . , 2n − 1, we test whether (x T i A)(Bx i ) = x T i (Cx i ). If this test fails for some i then AB = C. Otherwise AB = C (cf. Algorithm 2). The advantage of the above mentioned criterion is that for computing x T i A, Bx i , and Cx i , respectively, again fast multipoint polynomial evaluation algorithms can be used. This time, however, we must evaluate the respective polynomials in 2n − 1 different points, but this does not change the asymptotic complexity of the evaluation task. Therefore, all the necessary vector-matrix or matrix-vector products can be computed in time O(n 2 log 2 n) arithmetic operations, or even in O(n 2 log n) using fast Fourier transform. Hence the entire verification can be done within the same arithmetic complexity.
Theorem 2. For a Real RAM there exists a non-deterministic algorithm for computing matrix product of real matrices using O(n 2 log n) arithmetical operations.
For bounded coefficient arithmetic circuits over the real or complex numbers Raz [22] proved a lower bound of order Ω(n 2 log n). This bound does not apply to our case since we are using a different computational model than Raz -namely the model where nondeterministic computations with arbitrary large reals and equality tests are allowed.
4 Non-deterministic multiplication of integer matrices using modular arithmetic Algorithm 1 from Section 2 has one serious drawback: it computes with very large integers since the largest entry in vector x is as high as (2n − 1) n−1 . In matrix D = AB − C this entry can subsequently be multiplied by α, where α, defined in Section 2, denotes the bound on the maximal value of coefficient of matrices A, B, and C. That is, the bit representation of such an entry requires O(n log n + log α) = O(n log n) bits for a sufficiently large n. Hence, for such n the size of this entry is dominated by the term O(n log n) rather than by the size of the maximal possible coefficient in matrix D. Fortunately, this situation can be remedied by using modular arithmetic. However, one must be careful when trying to adjust the basic Algorithm 1 from Section 2 to modular arithmetic. The first problem arises because, as it appears, in finite fields no analogue of Cauchy's bound is known. This is probably due to the fact that finite fields cannot be ordered. A number that is not a root of a polynomial in the ring Z of integers may become a root of the same polynomial in a finite field Z p , for some p prime. For instance, in Z polynomial x 2 + 1 has no real roots. The same polynomial in Z 5 has two roots: 2 and 3. Thus, we must use a different criterion than that from Lemma 1 for identifying a zero matrix.
It appears that we can design a criterion similar to that used for the case of real numbers (cf. Corollary 1). This is possible since a polynomial of degree n has at most n roots in any finite field. The criterion from Corollary 1 can be adjusted to the case of finite fields thanks to the following lemma: Lemma 2. Let D be an integer matrix of size n × n with max{|d i,j |} ≤ δ. Let Z be the ring of integers and let Z p be a finite field of integers mod p, for some prime p > max{δ, 2n − 1}. Let further r i for i = 1, 2, . . . , 2n − 1, r i < p be 2n − 1 different integers and let x i = (1, r i , r 2 i , . . . , r n−1 i ) T . Then D = 0 (in Z) if and only if for all i = 1, 2, . . . , 2n − 1, x T i Dx i ≡ 0 (mod p) (in Z p ).

Proof:
The left-to-right implication is obvious. For the proof of the reverse direction suppose that under the assumptions of the lemma D = 0. Thanks to this and the choice of p at least one element d i,j of D must be non-zero mod p: d i,j ≡ 0 (mod p). This means that D ≡ 0 (mod p). In the vector-matrix-vector product (x T i Dx i ) mod p this element will give rise to a non-zero polynomial in indeterminate r i of degree at most 2n−2. This polynomial has at most 2n−2 roots in Z p and therefore it cannot turn to zero in 2n−1 distinct numbers r i mod p = r i . This, however, implies that it cannot hold x T i Dx i ≡ 0 (mod p) for all i = 1, 2, . . . , 2n − 1. This is a contradiction.
2 This Lemma opens the way for designing an algorithm using modular arithmetic for integer matrix multiplication similar to Algorithm 2 from Section 3.
In order to do so we have to choose a field of a suitable characteristic p. We will want to keep p relatively small, with p ≥ α. While avoiding computations with numbers above p, thanks to the definition of α such a choice of p will preserve the value of entries of matrices A, B, and C when considered in Z. However, one must be careful with this idea since in arithmetic (mod p) each element (except zero) has two representations: one positive and one negative: for any a with −p < a < p, a ≡ a − p (mod p). Therefore, when guessing the values of matrix C the correct values (with a correct sign) -i.e., those from Z -must be guessed because these are the values that would eventually be output. This is because, as we shall see, the correctness check is performed in arithmetic (mod p) (as suggested by Lemma 2) through which both the positive and the negative representation of the same non-zero number could pass. This could be prevented once we knew what would be the sign of each entry in C = {c i,j }. In general, it appears that for arbitrary matrices A and B the sign of each c i,j cannot be determined without actually computing it. Nevertheless, for special matrices this can be done. E.g., when all entries of A and B are positive, so will be the entries of C. This motivates the following definition.
A matrix A with all entries non-negative (non-positive) will be called a non-negative (non-positive) matrix. A matrix A that is either non-negative or non-positive will be called an equisigned matrix. Observe that thanks to the definition of matrix product in which rows are multiplied by columns the product of two equisigned matrices is again an equisigned matrix.
For any real matrix A = {a i,j }, A + is a matrix defined as It follows that instead of multiplying two general matrices A and B it is enough to add four products of equisigned matrices. Note that by this we have solved the problem of a possible dual representation of entries in matrix C in modular arithmetic. In an equisigned matrix all elements are either non-negative or non-positive. Therefore, in the sequel, w.l.o.g. we will only consider multiplication of equisigned matrices. Now, let us return to the problem of computing the product of integer matrices using arithmetic (mod p).
The question remains how to find a suitable p efficiently, i.e., in a way that would not deteriorate the (nearly) quadratic algebraic complexity of the verification algorithm. To that end we use the Bertrand's postulate stating that for any integer k > 1, there is a prime number p such that k < p < 2k (cf. [21]). Since we are aiming at a nondeterministic algorithm a suitable prime p will be guessed and subsequently tested for primality. An appropriate test for our purpose is the check devised by Pratt in 1975 [19] that can be nondeterministically realized with the help of O(log 2 p) modular multiplication on a unit-cost RAM.
The resulting Algorithm 3 mirroring Algorithm 2 is given below. This algorithm is based on Lemma 2 in which we put D = AB − C. In order to determine p we need to estimate δ from Lemma 2, i.e., to bound the absolute values of D.
Clearly, the maximal values in C are bounded by β = n(max{|a i,j |, |b i,j |}) 2 and hence the maximal absolute value in matrix D is at most δ = 2β. Thus, choosing p > max{δ, 2n − 1} will do.
As in the case of the previous algorithm the number of operation (mod p) of this algorithm is O(n 2 log 2 n). Its correctness follows from several facts. After Step 2, there exist many computational paths (one for each guess of C) in the computation, but the verification in Obviously, this algorithm can be implemented on the standard non-deterministic model of Turing machines with a similar efficiency, too. This is to be compared to the bit complexity of the best known deterministic algorithms which is of order O(n ω M (log p), for some ω : 2 < ω < 3. For any ω > 2 the latter expression is asymptotically worse than our estimation from the previous theorem.

Conclusion
We have presented two nondeterministic algorithms for matrix multiplication. Both algorithms make use of a derandomized version of Freivalds' algorithm for verification of matrix products. The first algorithm deals with real numbers and on a Real RAM is of time complexity O(n 2 log n). The second one computes with integers and on a unit-cost RAM is also of complexity O(n 2 log n). Moreover, the latter algorithm computes with numbers whose size is bounded by the maximal number occurring within the matrices. The respective complexity bounds hold when variants of fast Fourier transform are used for the multipoint polynomial evaluation used by both algorithms. With respect to their complexity both algorithms are the fastest known sequential algorithms for computing the product of integer and real matrices, respectively.
Of course, due to their nondeterministic nature our algorithms are more of a theoretical than of a practical value. Nevertheless, both of them can be used for deterministic matrix multiplication verification.
But there are more lessons to be taken from our results. First, for the case of integer matrices, they show that any lower bound for matrix multiplication which would be greater than our quadratic upper bound must avoid proof techniques that work both for deterministic and nondeterministic computations. For integer matrices such a lower bound must probably be dependent on the size of matrix entries since Algorithm 1 is of quadratic complexity (i.e., it is optimal). Second, our results pose interesting questions that we cannot answer at present: is there an intrinsic difference between the complexity of integer and real matrices multiplication (cf. Algorithm 1 vs. Algorithm 2)? Does matrix multiplication belong among problems that separate determinism from nondeterminism, i.e., does the use of nondeterminism in matrix multiplication help? Last but not least, it is interesting to observe that in order to prove our results we have used, in addition to nondeterminism, more powerful tools from computational algebra than those usually used in the design of fast "classical" matrix multiplication algorithms: comparisons, theory of Vandermonde matrices, properties of polynomial roots, primality certificate, modular arithmetic, and fast Fourier transform. Could it be the case that the use of such techniques is the key to the design of fast deterministic matrix multiplication algorithms?
We believe that our results have brought a further insight into the nature of matrix multiplication. They strengthen the hope that perhaps deterministic algorithms for matrix product of a similar complexity will be found.