Talk:Cartesian tree/GA1

Latest comment: 1 year ago by David Eppstein in topic GA Review

GA Review

edit

Article (edit | visual edit | history) · Article talk (edit | history) · Watch

Reviewer: Ganesha811 (talk · contribs) 17:27, 4 August 2023 (UTC)Reply


Hi! I'll be reviewing this article, using the template below. Sorry for the long wait! If you have any questions, feel free to ask them here. —Ganesha811 (talk) 17:27, 4 August 2023 (UTC)Reply

As this is a subject (actually, a whole field) I'm unfamiliar with, I'm diving into a little bit of research to make sure I can adequately assess comprehensiveness and detail. Hope to complete the review in the next few days. —Ganesha811 (talk) 16:49, 5 August 2023 (UTC)Reply
Ok, no hurry. —David Eppstein (talk) 20:59, 5 August 2023 (UTC)Reply
Rate Attribute Review Comment
1. Well-written:
  1a. the prose is clear, concise, and understandable to an appropriately broad audience; spelling and grammar are correct.
  • In general, some of the language is too technical and difficult for a novice reader to understand. I will make specific comments/suggestions on this as part of the prose review.
  1b. it complies with the Manual of Style guidelines for lead sections, layout, words to watch, fiction, and list incorporation.
  • Pass, no issues.
2. Verifiable with no original research:
  2a. it contains a list of all references (sources of information), presented in accordance with the layout style guideline.
  2b. reliable sources are cited inline. All content that could reasonably be challenged, except for plot summaries and that which summarizes cited content elsewhere in the article, must be cited no later than the end of the paragraph (or line if the content is not in prose).
  • All reliable academic sources, no issues here. Pass.
  2c. it contains no original research.
  2d. it contains no copyright violations or plagiarism.
3. Broad in its coverage:
  3a. it addresses the main aspects of the topic.
  • A sentence or two should be added to the lead, and a couple more to the body, on a very high-level, simple explanation of what a tree data structure is and how it works. This article is really only comprehensible now by going and reading a couple of other Wikipedia articles first, but it should be able to stand on its own.
  3b. it stays focused on the topic without going into unnecessary detail (see summary style).
  4. Neutral: it represents viewpoints fairly and without editorial bias, giving due weight to each.
  • No issues, pass.
  5. Stable: it does not change significantly from day to day because of an ongoing edit war or content dispute.
  • No issues. Pass.
6. Illustrated, if possible, by media such as images, video, or audio:
  6a. media are tagged with their copyright statuses, and valid non-free use rationales are provided for non-free content.
  • Pass, no issues, all are original works by nominator uploaded and released.
  6b. media are relevant to the topic, and have suitable captions.
  • Some tweaking possible to captions - will handle under prose review. Provisional pass.
  7. Overall assessment.
Re #3: I added some sentences to the start of the "Definitions" section providing glosses of the basic tree terminology used. It definitely does not belong in the lead. The phrasing of your comment "what a tree data structure is" already exhibits a misunderstanding: this is a mathematical structure, not a computer data structure. It can be represented by a data structure in a computer, and is used to define certain data structures, but it is not really a data structure on its own. Turning it into a data structure would require specifying additional information about how each node is represented in memory, what operations are to be performed on it and how, etc. —David Eppstein (talk) 06:14, 8 August 2023 (UTC)Reply
Well, I suppose that goes to show how this topic can lead to confusion - I wrote "tree data structure" because our article on it is called "tree (data structure)", and the article on binary trees (linked in the first sentence of this article) states "a binary tree is a k-ary k=2 tree data structure". The key thing to remember is that Wikipedia is for a generalist reader, not a specialist one. Try to imagine coming to this page as someone with little to no background knowledge, who is seeing it for the first time. Jargon should be defined parenthetically, wikilinked, or both; and generally it is, but there are exceptions. For example, "node" could use a short parenthetical explanation the first time it is used.
Another example; in the lead, the sentence "The smallest number in the sequence is at the root of the tree; its left and right subtrees are constructed recursively from the subsequences to the left and right of this number" could be rewritten as follows:
The smallest number in the sequence is picked to be the starting node, or "root" of the tree. Each node in a Cartesian tree can have up to two "child" nodes which it is linked to. From the subsequence of numbers to the left of the chosen "root", the smallest number is chosen as a child node of the root node; the same process also occurs on the subsequence of numbers to the right of the root, resulting in two new nodes below the root. This process is then repeated until all numbers in the sequence have been added to the tree.
This is wordier, and a little awkwardly phrased (I'm sure you can improve on it!), but I think would be clearer for a novice reader. The lead, especially, should pay close attention to the issue of making sure a generalist reader can parse it correctly and easily. As the article goes on, it's appropriate to introduce more detail. —Ganesha811 (talk) 12:32, 8 August 2023 (UTC)Reply
We (the second person plural pronoun) cannot (are unable to or maybe rather should not) explain (provide a more detailed explanation) for every (each one) of the words (units of meaning in the English language) that we use (incorporate into our sentences). That way lies madness. See WP:TECHNICAL, which states that an article should be written "as far as possible" for a wide audience, but not that all articles should be readable by all readers. This is an advanced computer science topic, suitable for third- or fourth-year computer science university students. Per WP:ONEDOWN, it should be readable by beginning computer science students, but not necessarily by, say, high schoolers or students of comparative literature. All beginning computer science students have seen binary trees. The more you elaborate the definitions of basic words, the more you hide the real content of the article and make it more difficult to read for people who already understand those words, the exact audience that we would hope is ready to read the article.
To this point, I'm a little surprised that you didn't expect me to provide a short parenthetical gloss of "number", the first time it is used. See Principia Mathematica, a famous work in mathematical logic that took the point of view that everything should be reduced to first principles. The consequence of applying this principle strictly was that it took hundreds of pages of dense logical formalism to prove that 1+1=2. —David Eppstein (talk) 17:32, 8 August 2023 (UTC)Reply
There's no need to be sarcastic or rude. I'm trying to improve the article, as you are. I think a straightforward reading of WP:TECHNICAL says that while it is impossible to explain some subjects simply or without jargon, we should always make our best efforts to be clear to a general reader and guide them through a topic, especially in the lead. I think it's clear we will not see eye to eye on this, so I'm going to request a second reviewer come take a look. —Ganesha811 (talk) 17:53, 8 August 2023 (UTC)Reply
I provided a gloss of the basic concepts needed for the article as you requested. You are now requesting a gloss of the terms used to define those basic concepts. That goes too far. A reader who needs that gloss-on-a-gloss is unready to read this article, and no amount of gloss will help. All it will do will make the article so cluttered as to be unreadable by the readers who are ready for it. —David Eppstein (talk) 19:02, 8 August 2023 (UTC)Reply
We have a pretty fundamental difference that would have a big impact on the article. As I said, I don't think we're likely to see eye to eye on this and I think you will be more successful working with a different reviewer. Good luck and happy editing! —Ganesha811 (talk) 19:12, 8 August 2023 (UTC)Reply

Second opinion

edit

When I picked this up I wasn't intending to count it towards the backlog but given the length of what I've managed to write, I'll query if it can be counted. The reviewer has tagged this as a second opinion, rather than abandoning the review, but in covering the criteria they had questioned/blank in the table at the top, I've managed to assess all the criteria.

The key issue is whether the topic balances readability, focus and explanation of technical terms in the manner appropriate for mathematical articles. I agree with David Eppstein that the minimum required knowledge (at least for the early part of the article) is introductory computer science: such topics as trees, traversals, recursion and heaps are assumed knowledge. (Even so, readers may need a moment to process, be unfamiliar with a term or two, or need a refresher—I'll confess to forgetting what a heap is.)

Some of these suggestions may contradict the previous reviewer:

Lead

edit
  • The case of repeated numbers in the sequence is a bit confused. The first mention of the issue is with When all numbers are distinct, the Cartesian tree is uniquely defined ... and then "distinct" is specified in both definitions in the body. Indeed, "the smallest" doesn't make sense if the sequence isn't distinct. Perhaps the first sentence could say "... a sequence of distinct numbers". When it gets to the last "Definition" paragraph, which seems to be the only place the non-distinct case is considered, this restriction can be explicitly relaxed. (FWIW: it seems in the non-distinct case there can be one Cartesian tree or many—I wonder if there's some interesting classification or counting problem here.)
    • Ok, added "distinct" to the lead. For the non-distinct case you would have a choice of orderings whenever you have a set of equal values separated by larger values, so the number of choices would be the product of the factorials of the sizes of these ambiguous sets, but I don't know of any sources saying anything about this. —David Eppstein (talk) 23:24, 10 August 2023 (UTC)Reply
  • For the first definition, I think it's easier to follow if it's phrased as a set of instructions. Like To construct the tree, set the root equal to the smallest number in the sequence and recursively construct the left and right subtrees from the subsequences before and after this number.
  • For the second definition, I think you need to say it's a min-heap (same in the body), and it couldn't hurt to add that this means any path from a leaf to the root is (strictly) decreasing. You could reuse the first note, about some authors using max-heaps.
  • Unlink data structure to fix the sea of blue (if you don't understand data structure I'm not sure what you can make of the rest of the sentence).
  • I think the Introduced by Vuillemin (1980) ... sentence is long enough to be split into two: the first on the introduction; the second on other applications.
  • may be constructed: "can be constructed" sounds more natural to me (same in the body).
    • I went through the article and either reworded or converted to "can" many but not all instances of "may". The ones I left in had more the sense of being optional than the ones I converted to "can", which had more the sense of being possible. —David Eppstein (talk) 23:24, 10 August 2023 (UTC)Reply

Definition

edit
  • The exposition These are mathematical structures rather than computer data structures, although Cartesian trees have been used as part of the definition of certain data structures ... As well, each node may have some associated data (a number, in the case of a Cartesian tree). can be cut as expected background knowledge; a reader who doesn't understand should follow the links (and confusion in binary tree or tree (data structure) about a tree being a mathematical structure are out of scope for this GA review).
  • I think some detail on the recursive sequence to tree algorithm is justified, but it's a bit wordy at the moment and hard to avoid this. Maybe it could be given as pseudocode or as instructions in a numbered list. I notice Demaine, Landau & Weimann (2009) introduce it the same way I started to write it down: using list indices.
  • Each node is associated with a single sequence value. – I don't think this is new information, so can be dropped. (And in fact, I feel this weak notion of "associated" drops the rigour of "uniquely defined by...")
  • That is, the left subtree ... and a similar ordering constraint holds ... – Looking to drop this last clause if at all possible, would it be sufficient to say "for each node, numbers in its left subtree occur before it in the sequence [and the same for the right]"?
  • Not at all a requirement for GA, but it'd be fantastic to have a gif of the recursive construction (sequence to tree) or an in-order traversal (tree to sequence).

Efficient construction

edit
  • Link "path" on first mention (and not later in the article).
  • I don't like "simply" in One method is to simply process.
  • Divide-and-conquer algorithm could be linked.
  • I think "spine" needs explanation before ... right spine of the left tree .... If the parenthetical is supposed to be this definition you might need multiple sentences for the algorithm's description.
    • Ok, rewrote to add an explanation for the spine and to separate the parenthetical description of its ordering (which is not really a definition) from the description of what to do in the merge. —David Eppstein (talk) 22:29, 13 August 2023 (UTC)Reply
  • ... up to a constant fraction of the elements in the current tree may be ... – I think "up to" is redundant to "may be".
    • It is definitely not required that there be many marked-as-deleted elements; after a rebuild there will be none. Rewrote to separate how the deletions are performed (just mark an element deleted) from the threshold to trigger a rebuild operation (the constant fraction part). —David Eppstein (talk) 22:29, 13 August 2023 (UTC)Reply

Applications

edit
  • Again, data structure can be unlinked.
  • The first sentence is probably best split in two. The paragraph could start as a standalone one-sentence definition of range minimum query.
  • in the first illustration requires quite a bit of scrolling and is going to be particularly confusing to anyone on mobile, who sees the image in this section immediately above this text. But I'm not sure the image is actually needed to make the point. Could the example be the (sub)sequence (12, 10, 20, 15, 18), where you just assert that 10 is the lowest common ancestor of 12 and 18 in the Cartesian tree? (I've chosen something where the ancestor isn't just the parent of the two numbers.)
  • I think the last sentence of the first paragraph flows better with the structure Because lowest common ancestors ... the same bounds can be achieved for the range minimization problem using a data structure that ...
    • The "using a data structure" is part of the description of the bounds and behavior of the lowest common ancestor data structure. "The same bounds" refers not just to the constant time bound, but also to the linear space and linear construction time. That's why "bounds" is plural. —David Eppstein (talk) 00:51, 14 August 2023 (UTC)Reply
  • differ by ±1 – just "differ by 1" is fine ("difference" generally means the absolute value).
  • Because a Cartesian tree is a binary tree, it is natural to use it ... – This bit feels more like an essay than a Wikipedia article. More matter-of-fact-ly, how about: The Cartesian tree of a sorted sequence is a path where each node other than the leaf has a right child. As such, a binary search tree algorithm degenerates to sequential search. However, the Cartesian tree can be adapted into a more-balanced search tree by ...
  • A link to Self-balancing binary search tree might fit.
  • Is there some rigour to this idea of "more-balanced"? Some worst-case, average-case or probabilistic quantification of treaps? I wonder what (they have logarithmic depth with high probability) means exactly.
    • The phrase "more-balanced" is no longer present after the previous edits. As for "logarithmic depth with high probability", it means that there exists a constant   such that the depth (maximum root-to-leaf distance) is   with high probability (probability tending to one in the limit as  ). Added an explanation. —David Eppstein (talk) 18:05, 16 August 2023 (UTC)Reply
  • Random binary search trees had been studied for much longer – Strange tense. Maybe Random binary search trees were an object of study before Cartesian trees.
  • the same good behavior carries over to treaps – I think this is already clear in context.
  • I think a definition of "bracket" is needed (when the given value is between two consecutive sequence values?).

History

edit
  • If I'm reading the original paper correctly, the tree was created in the field of geometric combinatorics and a natural counting problem on Cartesian trees gives the Stirling numbers of the first kind. This would need mention in this section, I think.
    • I think the connection to Stirling numbers is just a lemma. What Vuillemin was counting was the labeled Cartesian trees generated from permutations, of which there are obviously a factorial number. The Stirling numbers count how many of these there are with some number of nodes on the left or right spine. The actual point of this part of Vuillemin's paper appears to be to use the geometric view of these trees (with x=value and y=priority) to analyze the average case complexity of cutting and joining operations on binary search trees. I added a sentence about that to the history section. —David Eppstein (talk) 22:44, 16 August 2023 (UTC)Reply
  • I think this section should be directly above "Applications". It seems strange to introduce later applications first and then the original context in which the concept arose. Some rewording may then be necessary for this section to be the first introduction to the idea that Cartesian trees are in any way geometric.

References and overall comments

edit
  • Certainly I can say the article is broad and focused (with notes above on where background knowledge is/isn't needed). The inline citation style and academic references used are appropriate.
  • My spotchecks are limited to the open access sources, which thankfully includes the original paper. Any issues found are covered in the notes above.

Bilorv (talk) 22:34, 8 August 2023 (UTC)Reply

Thanks for picking this up so quickly. I think it's fair to say you can count it toward the backlog drive, and I'll remove it from my own list. —Ganesha811 (talk) 23:04, 8 August 2023 (UTC)Reply
@Bilorv: ok, I think I've responded to all your points; please take another look. —David Eppstein (talk) 22:45, 16 August 2023 (UTC)Reply
Thanks, I'll aim to get to this in the next 48 hours. — Bilorv (talk) 22:49, 16 August 2023 (UTC)Reply
Alright, one tweak but this all looks excellent—it's a   pass for GA from me. — Bilorv (talk) 21:08, 17 August 2023 (UTC)Reply
I'm glad this worked out. Thank you again for jumping in, @Bilorv. @David Eppstein, beg pardon for abruptly backing out. I felt that rather than us butting heads over the article point-by-point, it was more productive to step aside and let someone with a more-aligned view on WP:TECHNICAL collaborate with you. Congrats on another GA! —Ganesha811 (talk) 04:32, 18 August 2023 (UTC)Reply
Thanks, and thanks also for helping this go through by so graciously stepping back. —David Eppstein (talk) 06:10, 18 August 2023 (UTC)Reply