Comment on proposal to merge with article on sample covariance

edit

Here are some arguments in favour of keeping the topic scatter matrix independent of other topics:

1. I feel the scatter matrix is a more fundamental and more clearly defined statistic than sample covariance. With the former there is no ambiguity about the normalization. With the latter, you always have to ask whether it has been normalized by N or by N-1, or perhaps by some other number.

If someone looks in wikipedia to find out what is scatter matrix, this article gives a succinct definition and then links to applications, like sample covariance, covariance estimation and Wishart distribution.

2. The scatter matrix is used in other wikipedia articles (and could be used in still more), not just in the sample covariance article.

Entropeneur 07:23, 30 May 2007 (UTC)Reply

Technically correct, yet lacking in instructional aid?

edit

In this article we read:

 The scatter matrix is the m-by-m positive semi-definite matrix

but no where in the subsequent mathematical definition does m appear. One understands that an implicit Einstein summation notation maybe in force here, but to no benefit for the reader.

It isn't given here how the scatter matrix is distinct from the closely related covariance matrix. In fact, if one flips back and forth between the wikipedia pages for the two topics one could easily become convinced that they are one and the same; despite the standard deviation being the critical difference. Perhaps one can creditably state: a covariance matrix normalizes for variation within a variant, whereas a correlation matrix normalizes for both variation within a variant and (via 'standardizing' the deviation) between variants. That is, the covariance matrix is 1d normalized, the correlation matrix is 2d normalized; and a scatter matrix isn't normalized at all.

And/or it might be worthy to state that all three matrices (scatter, covariance, correlation) are the result of A'A (self matrix multiplication with the transpose on the left) with increasing levels of data intrinsic scaling.

Discussions of Linear Algebra seem at times to pursue obscurity and to reinvent terminology as a merit; so it's understood that untangling the endless details is a Herculean task, and (these days) wikipedia may end up being the last arbiter of clear meaning. — Preceding unsigned comment added by 73.19.41.124 (talk) 18:38, 31 March 2015 (UTC)Reply