Talk:Ternary search tree

Latest comment: 1 year ago by PrimeBOT in topic India Education Program course assignment

Untitled

edit

I would like to dispute the claim that a ternary search tree is faster than hashing "for many typical search problems". I believe this claim is based on the paper located at:

http://www.cs.princeton.edu/~rs/strings

..which the Dr. Dobbs article is also based on. This paper compares to a very slow hashtable implementation. In particular, they use a hashtable size that is not prime (thus increasing the chance of collisions) and a load factor of 1.0, when most performant implementations use around 0.75 (see Java hashtable for instance). A higher load increases collisions and decreases performance. They also use chaining where fast hashtables (for example google's freely available code) includes part of the chain in the table itself, thus almost always eliminating an extra cache miss. There may be say 2-4 total cache misses for a typical hashtable lookup so this is a huge difference.

Finally, the test setup is biased towards tst performance. They do lookups one after another in a tight loop and in sorted order, meaning that most of the ~9 nodes the TST has to traverse for a successful lookup are in the cpu cache already. For unsuccessful lookups they change the first letter of a string, thus ensuring the best possible results from the TST (which always compares parts of the string in order). For an isolated lookup, as in the program does a bunch of code then does a few query/add operation, a hash table will have far fewer cache misses than a TST.

Given the above, perhaps original article creator or somebody with in-depth knowledge can elaborate in article about which search problems are faster.

I don't know how relevant this is (only found this article by googling 'digital trie'), but people looking to compare hashing to a ternary search tree may find this interesting: http://unthought.net/c++/c_vs_c++.html --MaXiMiUS 22:47, 17 August 2007 (UTC)Reply

-- When storing a ternary tree on disk, I/O time has a large impact and your argument holds true.

I am familiar with the article by mr. Sedgewick and Bentley and believe the reason they claim that the ternary tree is "faster than hashing" is because

1. they assume the ternary tree will be stored in RAM and 2. the overhead of calculating a hash can be a fairly heavy (depending on the demands placed on the hash), whereas using bytes of the key itself as hash is trivial.

Apparently the performance lost by the overhead of a hash calculation outweighs the performance win of not having any collisions.

Also, because all read/write operations in a ternary tree start at its root and tend to follow the same paths, it is highly suitable for caching even if relatively slow storage is used.

It is also worth to keep in mind that a ternary tree allows traversing its contents in sorted order and performing partial-match key searches, which a hash table (typically) will not allow. Although this seems more an issue of functionality than one of performance, 'many typical search problems' does include partial matches, case insensitivity, and traversing the data set in sorted order. Using a hash table, such operations would be (much) less efficient, even with O(1) performance for full key matches. —Preceding unsigned comment added by 130.89.175.240 (talk) 14:25, 19 March 2010 (UTC)Reply

"consists of a series of binary searches, one for each character in the string" - there can be more than one binary search for each character. To see how, look at the u in "us" in the example.

Rafke (talk) 03:45, 28 November 2011 (UTC)Reply

Description is unclear

edit

Especially with how the "equal kid" relates to the parent? The example given makes no sense. Slashdottir (talk) 18:20, 23 April 2015 (UTC)Reply

First the key is split into parts then the one part is taken for the key in an binary tree. The binary tree is than handled as normal but with only the part of the key as binary tree key. In matching node there are two fields one for the data which is used to store the actual data when no more key parts are left and one field for another binary tree which will be used recursive by the same function with the rest of the key parts. This is my understanding on ternary tree. Or to make it short they are just nested tree binary trees. However the concept could also be used with other nested trees. To make it short the equal kid is the matching node in an sub tree.MasterLee (talk) 09:53, 10 September 2021 (UTC)Reply

Pseudocode clarification

edit

I'd like a simple comment or explanation of the call to "last_valid_index(query)", which I believe is just checking idx == len(query) - 1, ie, that the idx is the last valid index of the string and thus we have found the full word. I think the use of the function, unexplained, could cause confusion to readers. Makingtheconnection (talk) 00:34, 1 April 2018 (UTC)Reply

India Education Program course assignment

edit

  This article was the subject of an educational assignment at College Of Engineering Pune supported by Wikipedia Ambassadors through the India Education Program during the 2011 Q3 term. Further details are available on the course page.

The above message was substituted from {{IEP assignment}} by PrimeBOT (talk) on 20:03, 1 February 2023 (UTC)Reply