Talk:Farthest-first traversal
Farthest-first traversal has been listed as one of the Engineering and technology good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it. Review: November 7, 2021. (Reviewed version). |
This article is rated GA-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||
|
GA Review
editGA toolbox |
---|
Reviewing |
- This review is transcluded from Talk:Farthest-first traversal/GA1. The edit link for this section can be used to add comments to the review.
Reviewer: RoySmith (talk · contribs) 19:59, 5 November 2021 (UTC)
- "Every prefix of a farthest-first..." links to the wrong prefix. You probably want Substring#Prefix.
- link "traversal" to Graph traversal the first place it's used.
- But that's an article about graphs. Here there is no graph (until we get to the end of the Approximations section, and even then it's dubious to call it a graph traversal because those are about walks through the edges of a graph and here we're doing something else).
- The caption for the first diagram says "The first five steps", but there's only 4 steps shown. I guess that depends on whether a vertex or an edge constitutes a "step", but I assume it's an edge.
- No, it's a point. There is no graph. It's a sequence of points. Changed caption to "the first five points" to be clearer. —David Eppstein (talk) 19:47, 6 November 2021 (UTC)
- "no other set of equally many points can be spaced more than twice as widely" This needs some more explanation. Why can't they? Likewise for the following "less than half as far".
- Oh, reading further, I see "By the pigeonhole principle, some two points of the optimal solution (whatever it is) must both be within distance r of the same point among these first k − 1 chosen points, and (by the triangle inequality) within distance 2r of each other." Is this the explanation I was seeking?
- Yes, the sentence you quoted from the lead was intended as a summary (not as a claim needing explanation) of the max-min facility dispersion paragraph, containing the explanation you quote. —David Eppstein (talk) 19:47, 6 November 2021 (UTC)
- Oh, reading further, I see "By the pigeonhole principle, some two points of the optimal solution (whatever it is) must both be within distance r of the same point among these first k − 1 chosen points, and (by the triangle inequality) within distance 2r of each other." Is this the explanation I was seeking?
- "All points of the whole metric space..." Is "whole metric space" a specific kind of metric space, or does "whole" just mean "the entire thing" in this context? If so, then I think this could be simplified to just "All points of the metric space...".
- Ok. It was intended as a contrast with the selected subset of the previous bullet point, but since you found it confusing I removed it. —David Eppstein (talk) 19:47, 6 November 2021 (UTC)
- I'm not 100% sure about this, but in " seeks to minimize the maximum diameter of a cluster...", I think "diameter" should link to Diameter (graph theory), which in turn redirects to Distance (graph theory).
- Again, there is no graph. The first usage of "diameter" is linked to the geometric meaning of diameter. —David Eppstein (talk) 19:47, 6 November 2021 (UTC)
- The paragraph starting "Later, the same sequence of points was popularized by Gonzalez (1985)..." is kind of big. Is there some logical place to break this into smaller paragraphs?
- Split into two smaller paragraphs. —David Eppstein (talk) 00:13, 7 November 2021 (UTC)
- "are often incorrectly attributed to a different paper from the same time". The assertion of incorrect attribution needs to be backed up. Is that your evaluation (in which case it's WP:OR), or is there a WP:RS which says so?
- It is easy to source the fact that the Hochbaum-Shmoys paper is a different algorithm, not the farthest-first traversal, and it is also easy to (separately) find many prominent sources attributing the farthest-first traversal to Hochbaum-Shmoys, including the current version of a notable software package. I was trying to avoid pointing fingers, but in response to this query I have added both to the article. Are you really suggesting that to support the claim that these sources are making an incorrect attribution (certainly an encyclopedic thing to be claiming since we want to help people avoid repeating the same mistake) I need to find separate sources that point fingers at them and call them incorrect? In the process I also found a reference by Dyer and Frieze, from the same time as Gonzalez, describing the same algorithm (and saying that it is different from Hochbaum and Shmoys, with some detail on how it differs), which I chose as the source for the algorithms being different. —David Eppstein (talk) 00:58, 7 November 2021 (UTC)
@David Eppstein: That's all I can see. Overall, this is a nice article that does a good job of explaining a complicated topic. -- RoySmith (talk) 21:02, 5 November 2021 (UTC)
- @RoySmith: all comments responded to; please take another look. —David Eppstein (talk) 00:58, 7 November 2021 (UTC)
- Looks good. Just to tick all the boxes, I see no problems with the 6 GA criteria (Well written, Verifiable, Broad in coverage, Neutral, Stable, and Illustrated). Earwig didn't pick up any copy issues. -- RoySmith (talk) 04:08, 7 November 2021 (UTC)
Different interpretation of farthest insertion algorithm
editI see three different logic descriptions for the algorithm and challenge hereby the "correctness" of the description. I quote "correctness" as I am pretty sure there are two different ways to calculate the farthest insertion algorithm. I would regard them as two different algorithms which for the similarity were named identical.
I hereby question the GA category of this article.
Logic differences
- selection of node: maximum distance to the set, taking into account all nodes of the set for distance calculation
- selection of node: maximum distance to any one point of the set
- insertion of node: usually replacing that edge which will lead to minimum increase of total length/distance. Here described in Greedy Algorithm as: "Remove p from the not-yet-selected points and add it to the end of the sequence of selected points." Adding to the end is just wrong.
This article states the calculation logic as: "Each point p after the first must have the maximum possible distance to the set of points earlier than p in the sequence, where the distance from a point to a set is defined as the minimum of the pairwise distances to points in the set."
I read this as maximum of the sum of the (minimum) pairwise distances. While I think this is the better algorithm, most of the times the maximum distance to any one point is evaluated. If the latter is the case, then the animated graphics is incorrect. The topmost point is closer to the first point then the other ones on the top right corner.
This article refers in this respect to Rosenkrantz, D. J.; Stearns, R. E.; Lewis, P. M., II (1977), "An analysis of several heuristics for the traveling salesman problem"
That article is widely referenced for the farthest insertion heuristic. As such it should be reflected in this article how they select the nodes.
The selection of the node is describes as follows:
"We say that a tour is constructed by farthest insertion if each ai, 1 <= i < n, in the definition of an insertion method satisfies
(5.1) d(Ti, ai) = max {d(Ti, x) for x in N-Ti}.
Contrasting (5.1) with (4.2), we observe that farthest insertion has a max where nearest insertion has a min. "
The authors just search for the max distance to any one of the points and use that node.
They then write "Inserting nearby points late in the approximation is appealing because the short edges used late in the procedure are less likely to be accidentally deleted by some still later insertion."
If this is the goal, then the logic may not be ideal. Having a graph were one point is very far from the others may insert first all nodes nearest the two farthest nodes while the sum of distances would insert further away nodes first.
The Wikipedia article lacks a description of the goal of the algorithm. It focuses on the WHAT the algorithm does, not WHY.
When one searches the internet for "farthest insertion algorithm" a number of results come up. I could find only one implementation using the sum. Examples:
- https://www2.isye.gatech.edu/~mgoetsch/cali/VEHICLE/TSP/TSP015__.HTM, referring to Rosenkrantz et al
- MIT OpenCourseWare: https://ocw.mit.edu/courses/civil-and-environmental-engineering/1-203j-logistical-and-transportation-planning-methods-fall-2006/lecture-notes/lec16.pdf
- a typical GitHub implementation: https://flothesof.github.io/farthest-neighbors.html
- Princeton: https://www.cs.princeton.edu/courses/archive/fall16/cos126/checklist/tspextra.html
So, if the majority has a different understanding I am suggesting to recognize this in the article and state the differences.
Coming back to the Greedy Algorithm:
In the article the authors describe the insertion of the node for all insertion logic as:
"if T passes through more than one point, then a) find an edge (x, y) in T which minimizes: d(x,k)+d(k,y)-d(x,y) b) delete edge (x,y) and add edges (x,k) and (k,y) to obtain TOUR(T, k); if T passes through a single node i, then make TOUR(T, k) the two node tour consisting of edges (i, k) and (k, i). In either case, we say that TOUR(T, k) is obtained by inserting k into T."
I suggest to change that part of the description of the Greedy Algorithm. — Preceding unsigned comment added by GunterS (talk • contribs) 09:28, 15 March 2022 (UTC)
- Where on earth did you get SUM of pairwise distances, when clearly it states MIN of pairwise distances? ——David Eppstein (talk) 16:16, 15 March 2022 (UTC)
- I was looking at the chart. If the MIN of pairwise distances were used, then the third node would be the right one of the two points below the second one. It is clearly farther away from point 1 then the topmost one. So I assumed by Min of pairwise distances is meant MIN of (all aggregated) pairwise distances in the set. Using the plural in an array can mean all. GunterS (talk) 17:10, 15 March 2022 (UTC)
- Also, what "animated graphic" are you talking about? There is only a single image, not an animation. In that single image, the first point (with no arrow into it) is chosen arbitrarily. The second point is farthest from the first one. The third one is farthest from the first two (its minimum distance to the set of the first two points is larger than the minimum distances of any other point). Outside of your confused misinterpretations, there is only one algorithm to be described here. The "greedy algorithm" for TSP that you are describing is not farthest-insertion, it is cheapest-insertion, a different heuristic unrelated to the topic of this article. —David Eppstein (talk) 19:09, 18 March 2022 (UTC)
Citation for Greedy Exact Algorithm
editHello, apologies if this is the incorrect way to raise this issue. The greedy exact algorithm currently lists the article "Clustering to minimize the maximum intercluster distance" by Gonzalez as the source for this algorithm. However, in the copy of the article I retrieved from ScienceDirect, there appears to only be a description of a 2*OPT approximation algorithm, and no description of a greedy exact algorithm, even in the process of describing prior work. Would it be possible for the author to update the citation to reflect the original source of this algorithm? Wbrackenbury (talk) 13:56, 12 September 2022 (UTC)
- It is an exact algorithm for farthest-first traversal. It is a 2-approximation for other problems. Gonzales focuses on the other problems but the algorithm he describes is an algorithm for farthest-first traversal. —David Eppstein (talk) 16:34, 12 September 2022 (UTC)
- Thanks for the clarification! — Preceding unsigned comment added by 2601:249:8C00:6800:F8B1:3974:C9C4:F7D0 (talk) 16:43, 12 September 2022 (UTC)