Talk:Radix tree
This is the talk page for discussing improvements to the Radix tree article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||
|
tree
editI'm deleting a couple of claims that predecessor/successor are O(1). Consider a radix tree with contents {0, 00, 000, ..., 0...0, 1}, where we have strings of 0's of length 1 through n. Then the straightforward predecessor algorithm would take O(n) time to find the predecessor of "1". (You could do it in O(1) if you also threaded a doubly-linked list through all your leaves, but the same could be said for any binary search tree.) If there's some clever algorithm I'm not thinking of, could you cite it (or at least describe the algorithm, possibly here on the Talk page) before reinstating these claims? Thanks, Cwitty 21:31, 7 October 2006 (UTC)
- This counterexample applies to all dictionary data structures. For example, suppose you add n keys of length n to a hash table, and then do a lookup. Even after you've found the right hash bucket, you still need n operations to compare your key to the existing key. So it's O(n). Except it's not: hash table lookup is O(1). The mistake is to confound the key length and the number of elements. In typical practice key lengths are effectively constant; they don't dependent on the number of elements in your dictionary.
- In a radix tree the obvious predecessor and successor algorithms take time proportional to the key length, but this is effectively constant, so they are O(1). Reilly 14:16, 14 November 2006 (UTC)
- The claim that predecessor and successor algorithms run in constant time is probably based on a similar claim for trees in general. These algorithms actually run in *amortized* O(1) time. For radix trees, the number of nodes is linear in the number of entries, so it is correct to say that predecessor and successor can be computed in *amortized* O(1) time and not some runtime dependant on the length of the entries. For prefix trees this would not be true. 68.191.211.204 (talk) 13:44, 15 December 2009 (UTC)
- Not really. Hash algorithm takes a key and gives you the address, no intermediate lookups required, it's just one lookup operation (unless there are clashes). In a radix tree you do active lookups, without intermediate lookups you are going nowhere. Compare this with slow disk lookups: hash would compute in active memory (or even CPU registers) and give you the address immediately (just go and fetch it), while with radix tree you will have to access the disk (slowly!) many times because index may not fit into active memory (certainly not CPU registers). And please remember that O() notation is used when n is big (hence assumption that index does not fit in memory). Note that famous O(1) scheduler in Linux is a misnomer, it is really of O(log(n)) complexity (it's just process id space is limited to 16bit space -- the same argument you tried here).
- Actually, the argument is valid: actually computing the hash function is O(k), where k is the key length, same as the radix tree lookup — this is just glossed over when people usually think about hash tables, because dictionary key lengths are usually small(ish); but they're definitely a lot more expensive than doing bitwise operations and integer addition which are the basic operations in radix tree algorithms. Wtrmute (talk) 23:11, 23 September 2008 (UTC)
- Not really. Hash algorithm takes a key and gives you the address, no intermediate lookups required, it's just one lookup operation (unless there are clashes). In a radix tree you do active lookups, without intermediate lookups you are going nowhere. Compare this with slow disk lookups: hash would compute in active memory (or even CPU registers) and give you the address immediately (just go and fetch it), while with radix tree you will have to access the disk (slowly!) many times because index may not fit into active memory (certainly not CPU registers). And please remember that O() notation is used when n is big (hence assumption that index does not fit in memory). Note that famous O(1) scheduler in Linux is a misnomer, it is really of O(log(n)) complexity (it's just process id space is limited to 16bit space -- the same argument you tried here).
Could anyone please explain the meaning of zeros and ones in the picture of Overview? There's is also "song" word which is not in the tree at the top of the article which messes everything. —Preceding unsigned comment added by 130.225.195.72 (talk) 13:23, 17 July 2008 (UTC)
This page have many errors. One of the most blatant is that PATRICIA-trie is not a RADIX tree and it does not look the way it is described here. PATRICIA-tree uses lossy path-compression. See, e.g. the original paper of Morrison, the work Gwehenberger, where he describes a similar data-structure and the 3d Book of the Arts of Computer Programming. Moreover, the RADIX tree that is the main name of this entry is not widely-accepted computer term. Instead, it is a colloquialism that occurred in the mid-eighties.
- I agree - patricia trees are definitely something else than regular tries with strings labeling the edges instead of characters. After I got my hands on the original paper, this article started looking downright misleading. This article either needs a complete rewrite, or a new article that describes the real PATRICIA index structure is needed.
--Cynebeald (talk) 18:30, 16 March 2009 (UTC)
- I also agree, this page is very misleading in that it implies that a PATRICIA trie stores strings verbatim, it does not so does not allow you to enumerate stored strings, or to be sure you have an exact match (see here for an example: www.cs.umd.edu/~meesh/420/Notes/MountNotes/lecture25-tries.pdf). Also the pseudo-code is so vague in how 'is a prefix of' is determined, which is the critical part of the algorithm, to be useless. You would need some secondary index for the outbound edges of a node for this to be efficient, and it is is completely unclear what that is. In fact I have been unable to find anything that looks very much like this in the literature; a burst trie stores multi-character nodes in containers. I charged ahead and implemented something very odd on the basis of reading this article. A complete rewrite would be good, at the moment this article's presence does more harm than good. Silasdavis (talk) 12:30, 5 June 2014 (UTC)
Diagram
editI have added a simple diagram (and I removed reqdiagram) . User:rocchini 2007-05-17
The nice diagram given at the front gives the misleading impression that each node always has two children. That would be true if you branch on bits; but if branching on characters there can be as many children as there are letters in your character set. Bhyde (talk) 15:52, 13 March 2010 (UTC)
Agreed. The diagram at the top of the article has a poor choice of sample strings, suggesting there must always be two children. -- Ralph Corderoy (talk) 11:34, 8 August 2010 (UTC)
How would the strings that don't end at a leaf eg: "roman" be indicated? 210.48.82.11 (talk) 00:38, 14 December 2010 (UTC)
Second Diagram
editThe second diagram depicts a structure that is not a tree. It is also not a trie, prefix tree, or radix tree. The article provides no explanation for the image. 68.191.211.204 (talk) 13:44, 15 December 2009 (UTC)
Nomenclature
editThis article should be named "Radix trie" not "Radix tree", because Patricia tries and radix tries are tries, not (search) trees. This article never once mentions that Patricia tries are also called Radix-2 tries. Indeed, the article never once mentions why radix tries are called radix tries: you can choose binary, quaternary, octary, and so forth through the higher-order powers of two radixes. At each node you can diverge to two children via the 2 values of the next 1 bit (i.e., Patricia trie = Radix-2 trie) or you can diverge to four children via 4 values of the next 2 bits (i.e., Radix-4 trie) or you can diverge to eight children via 8 values of the next 4 bits (i.e., Radix-8 trie). This article does not clearly present the fact that the entire concept of the Internet's routing is based on radix tries, especially inter-autonomous-system protocols such as BGP. —optikos (talk) 06:38, 5 May 2010 (UTC)
I am not sure why an invented name (eg. crit-bit trees) for these tries has been injected into this article. It is not a common name, nor particularly important among other re-inventions of this structure over the years. Oyigit (talk) 14:50, 30 July 2010 (UTC)
- Rather, one should ask why a mostly *unrelated* tree structure has been injected into this article. Crit-bit trees are **not** tries, but an optimized flavour of binary search trees. A crit-bit tree is a binary search tree which only branches at every bit position where two subsets of previously inserted keys differed, according to the bit's value in the newly inserted key. Unlike a trie, it therefore does not contain the full key in its structure (it is perfectly possible, and valid, for a key to differ in one or several bits with any previously seen key, before the "critical" bit) and needs to store the key in leaf nodes for an unambiguous lookup. — Preceding unsigned comment added by 92.192.19.64 (talk) 14:32, 24 July 2012 (UTC)
Invalid association of PATRICIA
editAccording to the original paper defining PATRICIA, published by DR Morrison in October 1968s "Journal of the ACM", PATRICIA is a type of radix tree, but not all radix trees are PATRICIA trees. Two highly reputable academic resources implement a very specific algorithm which doesn't stray at all from the ACM article. It seems incorrect to suggest that radix trees are also known as PATRICIA trees, so I'm removing that reference. Sebivor (talk) 07:34, 15 July 2015 (UTC)
- There still is a picture that describes Patricia tries. It would be great if the relationship to this kind of tries would be explained so that that name doesn't come from nowhere. --2001:16B8:6849:9A00:259E:59E7:BFFF:1D38 (talk) —Preceding undated comment added 08:42, 8 January 2020 (UTC)
Don't radix tries have at MOST radix children per node?
editThe article currently says they have at LEAST radix children per node, but I really don't see how can they have more than radix, and the diagram certainly shows less than radix children per node.79.183.100.150 (talk) 09:55, 20 February 2016 (UTC)
- I agree. The graph captioned "Insert 'slower' while keeping 'slow'" breaks this. I also find the talk of "the radix" of a radix tree quite confusing. What is the "radix" of a radix tree? It's not defined anywhere. Lookup radix, you find "base", look for the equivalent in a tree, you find binary tree, you find the K-ary tree article, which means "at most K chilren per node", but you never see the word "base" or "radix". And this sentence makes no sense: "where the quantity of bits in that chunk at that node is the radix r of the radix trie." ..., huh? I see no examples of quantity of bits in a node having anything to do with radix. None of this seems to belong in the summary either, if you look at non-wikipedia sources of radix trees, I didn't run into a single one which even mentions the "radix." 73.53.70.213 (talk) 04:49, 16 December 2016 (UTC)
please tie the text better to the example
edit"When r is an integer power of 2 greater or equal to 4, then the radix trie is an r-ary trie, which lessens the depth of the radix trie at the expense of potential sparseness."
It would be great to close the loop here with the example tree in the first illustration. Is it a 5-ary trie since the largest key is 5 chars? The article uses 'bits' somewhat loosely, as presumably the keys here are characters, which are 7 bits in ASCII encoding, not one. I find this article a bit puzzling, and wish an expert would clarify!
-- Alison Chaiken Alison Chaiken (talk) 17:11, 11 March 2017 (UTC)
Operation/Insertion Diagram
editThe text and the operation in diagram meant to indicate that word "test" is already in the tree and the word "team" is being inserted afterward. However, it is mistakenly written into Diagram that word "test" is being inserted. Please edit the diagram and change it to "team". Bossudenotredame (talk) 12:59, 28 September 2018 (UTC)
OK, The diagram does not indicate the word test is being inserted rather it is showing the before and after status of the tree. I was confused. Maybe the diagram should indicate that the tree on the left is the "before" state and the one on the right is the "after" state. Bossudenotredame (talk) 08:18, 13 October 2018 (UTC)
Description of PATRICIA added.
editI came here looking for a precise and concise description of PATRICIA, and finding nothing really, I decided to write it myself and add it to the "Variants" section. I first implemented PATRICIA back in the 1980s, and I remember having access to a wonderfully clear description back then, complete with diagrams. Unfortunately I can no longer remember what that book was - I can't find it in Knuth or Wirth. It might have been Sedgewick. In the meantime I've referenced Morrison's original article in ACM.
- The Description of PATRICIA is now reads as PATRICIA is a special variant of radix - 2 trie which is radix - 2 trie :-/ the definition given for PATRICIA is equivalent to the definition of radix - 2 trie. If we only include nodes we remove lonely middle nodes then we end up with a trie with only branch nodes which PATRICIA definition is trying to explain. Bossudenotredame (talk) 17:33, 1 May 2024 (UTC)
Summary references
editThe summary starts with this sentence: "In computer science, a radix tree (also radix trie or compact prefix tree) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent."
I have been unable to track down any references for this definition, and it is IMHO inaccurate. The term "radix" references radix sort, and I fail to see how path compaction has anything to do with the number of elements in the alphabet or the arity of the tree. Is it possible for the original author to provide references for this section? Wppds (talk) 07:29, 22 March 2019 (UTC)
Does not follow.
editThe sentence "[t]he result is that the number of children of every internal node is at most the radix r of the radix tree" appears to make the claim that its stated fact follows from the preceding claim "each node that is the only child is merged with its parent".
This is not so; if we "decompress" the tree such that the merged parents are split into only children and grandchildren, the tree still has the at-most-r property.
Furthermore, radix trees are not all compressed this way.
There are radix trees which can be navigated by breaking a (typically numeric) key into pieces (typically bit fields). Each piece directly selects a child. An example of this is the page directory structure of a virtual memory system. KazKylheku (talk) 16:01, 1 February 2022 (UTC)