Talk:Distance matrix
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||
|
Untitled
editI disagree with the merger. Adjacency matrices are simply a way to tell whether two vertices are connected or not, and by how many paths. Distance matrices give the distances between two vertices, not whether they are connected or not.
I disagree with the propsed merger with adjacency matrix. These are different concepts. Charles Matthews 13:58, 21 February 2006 (UTC)
I also disagree with the merger. However, these topics are somewhat related and there should be a sentence or two describing their relationship. -- BAxelrod 13:54, 23 February 2006 (UTC)
- Too bad no one of you added any information about adjacency matrices to the article - at least that should have been done before removing the tags... --Abdull 18:03, 30 April 2006 (UTC)
I also disagree with the merger. This would leave out the important Mahalanobis distance, which see. Pmanleycooke (talk) 08:13, 12 August 2008 (UTC)
Merging with Distance matrix methods
editThese topic are closely related. Could we merged them in some way?
Distance matrix should be symmetric.
editUnlike a Euclidean distance matrix, the matrix does not need to be symmetric—that is, the values xi,j do not necessarily equal xj,i.
From a mathematical perspective, if it's not symmetric, it's not really a distance matrix. Intuitively, distance implies the distance from a to b should be equivalent to the distance from b to a. If this doesn't hold, I don't think it should be called a distance matrix. This property is not exclusive to Euclidean space either. Djh901 (talk) 17:59, 13 October 2015 (UTC)
- This article is a bit of a mess. The opening sentence clearly talks about distance as a metric, but the applications in bioinformatics and related fields use distance with a looser non-metric meaning (and while they even talk about metrics, these are not the mathematically defined terms). In these applications, distances can be negative and don't have to be symmetric. I could help fixing up the mathematical side of this topic, but the other applications are beyond my ken. Bill Cherowitzo (talk) 05:30, 14 October 2015 (UTC)
- Can you think of applications that use the term loosely? In my experience, distance implies some structure in the matrix that can then be used by algorithms for clustering or tree building. If a matrix doesn't actually have any of this structure, I don't think the term distance should be used. Maybe in the math definition non-negativity is too restrictive because transformations can usually resolve this, but I think asymmetry should at least hold. Djh901 (talk) 16:36, 17 October 2015 (UTC)
- While searching for something more explicit to give you I stumbled upon the shortest path problem in networks which is a clear example. I've amended the article accordingly. Bill Cherowitzo (talk) 02:46, 21 October 2015 (UTC)
References to check out
edit- Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
- Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979) Multivariate Analysis. Academic Press.
- Borg, I. and Groenen, P. (1997) Modern Multidimensional Scaling. Theory and Applications. Springer.
The section "Comparison with Euclidean distance matrix" contradicts the "Formalization" section completely
editIt makes the claim that the matrix need not be symmetric and that it need not be hollow. The rules 2 and 3 under formalization directly contradict this. They state precisely in no uncertain terms that the matrix must be both symmetric and hollow. It also seems to want to allow for complex valued metrics which is not in line with the standard definition of a metric. As it stands it is a complete mess. The article should at least be self consistent even if it is inaccurate.--2003:69:CD3F:B01:2876:80EC:DC6E:7C68 (talk) 12:19, 20 October 2015 (UTC)
- I believe that you made the wrong choice here. The "Formalization" section was only added recently, and, as you point out, contradicted material that was already on the page for a long while. The application areas seem to use this concept with a non-metric distance in mind (look at the See also topics). Given such a dicotomy of approaches to the topic, we should not be making a choice here but rather attempting to describe both views. I reverted your recent edits so as not to lose this other viewpoint, but the article certainly needs work. Finding some reliable sources for each viewpoint would be a great start. Bill Cherowitzo (talk) 18:42, 20 October 2015 (UTC)
- To be fair, I removed it, but after reading your comment, I agree that covering both points of view is the best way forward, at least until we uncover more sources. Djh901 (talk) 14:56, 22 October 2015 (UTC)
Restructure "applications"
editWe currently have the applications of a "distance matrix" spread in sections "Bioinformatics", "Data Mining and Machine Learning", "Information retrieval", "Chemistry", and "Other Applications". Some of these about specific definitions of distances, some talk about specific algorithm run on a distance matrix, some talk about both.
I believe we should talk keep field-specific definition of distances inside the "field" sections, but move the algorithms out to a more general section. Right now we are basically mentioning clustering algorithms twice, once as phylogeny, once as data-mining.
My current preferred outline to rewrite into would look like:
- Algorithms on a distance matrix
- Clustering
- Hierarchical clustering and phylogeny
- K-nearest neighbors
- Isomap
- Neighborhood Retrieval Visualizer
- Dynamic time warping
- Clustering
- Definition of distance functions
- Bioinformatics
- Among sequences: alignment and alignment-free
- Among 3D volumes
- Information retrieval
- Gaussian mixture distance, defined over documents
- Comparison with cosine similarity
- Gaussian mixture distance, defined over documents
- Chemistry
- Basically everything here are definitions of distance functions.
- Bioinformatics