Talk:Similarity search

Latest comment: 7 years ago by RichardThePict in topic LHS

LHS

edit

The addition of LSH here is rather gratuitous and is technically very weak - to be polite. I would incline to remove it.

It would be nice instead to include a better overview of problems related to eg intrinsic dimensionality, along with an overall classification, eg exact and approximate, euclidean and non-euclidean; then LSH could be slotted in with a link where it belongs in this context, which is a very minor aspect

assuming nobody objects (I wrote most of this text years ago and nobody seems very interested!) I'll do this soon if time allows RichardThePict (talk) 16:40, 5 June 2017 (UTC)Reply

Untitled

edit

I don't think it is appropriate to redirect "similarity search" to "nearest neighbor search". For one thing, a very common types of similarity search is based on a threshold. E.g., To find near duplicate Web pages, Google uses the following similarity query: find all vectors whose Hamming distance from a query vector is no more than 3. (Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma: Detecting near-duplicates for web crawling. WWW 2007: 141-150).

I agree, it's absolutely not a synonym, similarity search is a much more general topic which is becoming increasingly important in domains other than nearest-neighbour. Although this comment is now years old I will take it as just reason to create a new page to act as a stub. Pavel Zezula made a plea at the last SISAP (Similarity Search) conference for the community to create good pages on this important topic so to start with let's have our own page! RichardThePict (talk) 09:14, 9 October 2013 (UTC)Reply