Talk:Similarity search

Computing Low‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Low	This article has been rated as Low-importance on the project's importance scale.

Internet Low‑importance

	Internet portal This article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the Internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.InternetWikipedia:WikiProject InternetTemplate:WikiProject InternetInternet articles
Low	This article has been rated as Low-importance on the project's importance scale.

LHS

Latest comment: 7 years ago1 comment1 person in discussion

The addition of LSH here is rather gratuitous and is technically very weak - to be polite. I would incline to remove it.

It would be nice instead to include a better overview of problems related to eg intrinsic dimensionality, along with an overall classification, eg exact and approximate, euclidean and non-euclidean; then LSH could be slotted in with a link where it belongs in this context, which is a very minor aspect

assuming nobody objects (I wrote most of this text years ago and nobody seems very interested!) I'll do this soon if time allows RichardThePict (talk) 16:40, 5 June 2017 (UTC)Reply

Untitled

Latest comment: 11 years ago1 comment1 person in discussion

I don't think it is appropriate to redirect "similarity search" to "nearest neighbor search". For one thing, a very common types of similarity search is based on a threshold. E.g., To find near duplicate Web pages, Google uses the following similarity query: find all vectors whose Hamming distance from a query vector is no more than 3. (Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma: Detecting near-duplicates for web crawling. WWW 2007: 141-150).

I agree, it's absolutely not a synonym, similarity search is a much more general topic which is becoming increasingly important in domains other than nearest-neighbour. Although this comment is now years old I will take it as just reason to create a new page to act as a stub. Pavel Zezula made a plea at the last SISAP (Similarity Search) conference for the community to create good pages on this important topic so to start with let's have our own page! RichardThePict (talk) 09:14, 9 October 2013 (UTC)Reply

Add topic