For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose is a finite sequence of random matrices. Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter t:

The following theorems answer this general question under various assumptions; these assumptions are named below by analogy to their classical, scalar counterparts. All of these theorems can be found in (Tropp 2010), as the specific application of a general result which is derived below. A summary of related works is given.

Matrix Gaussian and Rademacher series

edit

Self-adjoint matrices case

edit

Consider a finite sequence   of fixed, self-adjoint matrices with dimension  , and let   be a finite sequence of independent standard normal or independent Rademacher random variables.

Then, for all  ,

 

where

 

Rectangular case

edit

Consider a finite sequence   of fixed matrices with dimension  , and let   be a finite sequence of independent standard normal or independent Rademacher random variables. Define the variance parameter

 

Then, for all  ,

 

Matrix Chernoff inequalities

edit

The classical Chernoff bounds concern the sum of independent, nonnegative, and uniformly bounded random variables. In the matrix setting, the analogous theorem concerns a sum of positive-semidefinite random matrices subjected to a uniform eigenvalue bound.

Matrix Chernoff I

edit

Consider a finite sequence   of independent, random, self-adjoint matrices with dimension  . Assume that each random matrix satisfies

 

almost surely.

Define

 

Then

 
 

Matrix Chernoff II

edit

Consider a sequence   of independent, random, self-adjoint matrices that satisfy

 

almost surely.

Compute the minimum and maximum eigenvalues of the average expectation,

 

Then

 
 

The binary information divergence is defined as

 

for  .

Matrix Bennett and Bernstein inequalities

edit

In the scalar setting, Bennett and Bernstein inequalities describe the upper tail of a sum of independent, zero-mean random variables that are either bounded or subexponential. In the matrix case, the analogous results concern a sum of zero-mean random matrices.

Bounded case

edit

Consider a finite sequence   of independent, random, self-adjoint matrices with dimension  . Assume that each random matrix satisfies

 

almost surely.

Compute the norm of the total variance,

 

Then, the following chain of inequalities holds for all  :

 

The function   is defined as   for  .

Subexponential case

edit

Consider a finite sequence   of independent, random, self-adjoint matrices with dimension  . Assume that

 

for  .

Compute the variance parameter,

 

Then, the following chain of inequalities holds for all  :

 

Rectangular case

edit

Consider a finite sequence   of independent, random, matrices with dimension  . Assume that each random matrix satisfies

 

almost surely. Define the variance parameter

 

Then, for all  

 

holds.[1]

Matrix Azuma, Hoeffding, and McDiarmid inequalities

edit

Matrix Azuma

edit

The scalar version of Azuma's inequality states that a scalar martingale exhibits normal concentration about its mean value, and the scale for deviations is controlled by the total maximum squared range of the difference sequence. The following is the extension in matrix setting.

Consider a finite adapted sequence   of self-adjoint matrices with dimension  , and a fixed sequence   of self-adjoint matrices that satisfy

 

almost surely.

Compute the variance parameter

 

Then, for all  

 

The constant 1/8 can be improved to 1/2 when there is additional information available. One case occurs when each summand   is conditionally symmetric. Another example requires the assumption that   commutes almost surely with  .

Matrix Hoeffding

edit

Placing addition assumption that the summands in Matrix Azuma are independent gives a matrix extension of Hoeffding's inequalities.

Consider a finite sequence   of independent, random, self-adjoint matrices with dimension  , and let   be a sequence of fixed self-adjoint matrices. Assume that each random matrix satisfies

 

almost surely.

Then, for all  

 

where

 

An improvement of this result was established in (Mackey et al. 2012): for all  

 

where

 

Matrix bounded difference (McDiarmid)

edit

In scalar setting, McDiarmid's inequality provides one common way of bounding the differences by applying Azuma's inequality to a Doob martingale. A version of the bounded differences inequality holds in the matrix setting.

Let   be an independent, family of random variables, and let   be a function that maps   variables to a self-adjoint matrix of dimension  . Consider a sequence   of fixed self-adjoint matrices that satisfy

 

where   and   range over all possible values of   for each index  . Compute the variance parameter

 

Then, for all  

 

where  .

An improvement of this result was established in (Paulin, Mackey & Tropp 2013) (see also (Paulin, Mackey & Tropp 2016)): for all  

 

where   and  

edit

The first bounds of this type were derived by (Ahlswede & Winter 2003). Recall the theorem above for self-adjoint matrix Gaussian and Rademacher bounds: For a finite sequence   of fixed, self-adjoint matrices with dimension   and for   a finite sequence of independent standard normal or independent Rademacher random variables, then

 

where

 

Ahlswede and Winter would give the same result, except with

 .

By comparison, the   in the theorem above commutes   and  ; that is, it is the largest eigenvalue of the sum rather than the sum of the largest eigenvalues. It is never larger than the Ahlswede–Winter value (by the norm triangle inequality), but can be much smaller. Therefore, the theorem above gives a tighter bound than the Ahlswede–Winter result.

The chief contribution of (Ahlswede & Winter 2003) was the extension of the Laplace-transform method used to prove the scalar Chernoff bound (see Chernoff bound#Additive form (absolute error)) to the case of self-adjoint matrices. The procedure given in the derivation below. All of the recent works on this topic follow this same procedure, and the chief differences follow from subsequent steps. Ahlswede & Winter use the Golden–Thompson inequality to proceed, whereas Tropp (Tropp 2010) uses Lieb's Theorem.

Suppose one wished to vary the length of the series (n) and the dimensions of the matrices (d) while keeping the right-hand side approximately constant. Then n must vary approximately as the log of d. Several papers have attempted to establish a bound without a dependence on dimensions. Rudelson and Vershynin (Rudelson & Vershynin 2007) give a result for matrices which are the outer product of two vectors. (Magen & Zouzias 2010) provide a result without the dimensional dependence for low rank matrices. The original result was derived independently from the Ahlswede–Winter approach, but (Oliveira 2010b) proves a similar result using the Ahlswede–Winter approach.

Finally, Oliveira (Oliveira 2010a) proves a result for matrix martingales independently from the Ahlswede–Winter framework. Tropp (Tropp 2011) slightly improves on the result using the Ahlswede–Winter framework. Neither result is presented in this article.

Derivation and proof

edit

Ahlswede and Winter

edit

The Laplace transform argument found in (Ahlswede & Winter 2003) is a significant result in its own right: Let   be a random self-adjoint matrix. Then

 

To prove this, fix  . Then

 

The second-to-last inequality is Markov's inequality. The last inequality holds since  . Since the left-most quantity is independent of  , the infimum over   remains an upper bound for it.

Thus, our task is to understand   Nevertheless, since trace and expectation are both linear, we can commute them, so it is sufficient to consider  , which we call the matrix generating function. This is where the methods of (Ahlswede & Winter 2003) and (Tropp 2010) diverge. The immediately following presentation follows (Ahlswede & Winter 2003).

The Golden–Thompson inequality implies that

 , where we used the linearity of expectation several times.

Suppose  . We can find an upper bound for   by iterating this result. Noting that  , then

 

Iterating this, we get

 

So far we have found a bound with an infimum over  . In turn, this can be bounded. At any rate, one can see how the Ahlswede–Winter bound arises as the sum of largest eigenvalues.

Tropp

edit

The major contribution of (Tropp 2010) is the application of Lieb's theorem where (Ahlswede & Winter 2003) had applied the Golden–Thompson inequality. Tropp's corollary is the following: If   is a fixed self-adjoint matrix and   is a random self-adjoint matrix, then

 

Proof: Let  . Then Lieb's theorem tells us that

 

is concave. The final step is to use Jensen's inequality to move the expectation inside the function:

 

This gives us the major result of the paper: the subadditivity of the log of the matrix generating function.

Subadditivity of log mgf

edit

Let   be a finite sequence of independent, random self-adjoint matrices. Then for all  ,

 

Proof: It is sufficient to let  . Expanding the definitions, we need to show that

 

To complete the proof, we use the law of total expectation. Let   be the expectation conditioned on  . Since we assume all the   are independent,

 

Define  .

Finally, we have

 

where at every step m we use Tropp's corollary with

 

Master tail bound

edit

The following is immediate from the previous result:

 

All of the theorems given above are derived from this bound; the theorems consist in various ways to bound the infimum. These steps are significantly simpler than the proofs given.

References

edit
  • Ahlswede, R.; Winter, A. (2003). "Strong Converse for Identification via Quantum Channels". IEEE Transactions on Information Theory. 48 (3): 569–579. arXiv:quant-ph/0012127. doi:10.1109/18.985947. S2CID 523176.
  • Mackey, L.; Jordan, M. I.; Chen, R. Y.; Farrell, B.; Tropp, J. A. (2012). "Matrix Concentration Inequalities via the Method of Exchangeable Pairs". The Annals of Probability. 42 (3): 906–945. arXiv:1201.6002. doi:10.1214/13-AOP892. S2CID 9635314.
  • Magen, A.; Zouzias, A. (2010). "Low-Rank Matrix-valued Chernoff Bounds and Approximate Matrix Multiplication". arXiv:1005.2724 [cs.DS].
  • Oliveira, R.I. (2010a). "Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges". arXiv:0911.0600 [math.CO].
  • Oliveira, R.I. (2010b). "Sums of random Hermitian matrices and an inequality by Rudelson". arXiv:1004.3821 [math.PR].
  • Paulin, D.; Mackey, L.; Tropp, J. A. (2013). "Deriving Matrix Concentration Inequalities from Kernel Couplings". arXiv:1305.0612 [math.PR].
  • Paulin, D.; Mackey, L.; Tropp, J. A. (2016). "Efron–Stein inequalities for random matrices". The Annals of Probability. 44 (5): 3431–3473. arXiv:1408.3470. doi:10.1214/15-AOP1054. S2CID 16263460.
  • Rudelson, M.; Vershynin, R. (2007). "Sampling from large matrices: an approach through geometric functional analysis". Journal of the Association for Computing Machinery. 54 (4 ed.). arXiv:math/9608208. Bibcode:1996math......8208R. doi:10.1145/1255443.1255449. S2CID 6054789.
  • Tropp, J. (2011). "Freedman's inequality for matrix martingales". arXiv:1101.3039 [math.PR].
  • Tropp, J. (2010). "User-friendly tail bounds for sums of random matrices". Foundations of Computational Mathematics. 12 (4): 389–434. arXiv:1004.4389. doi:10.1007/s10208-011-9099-z. S2CID 17735965.