Talk:Distributed web crawling
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||
|
From Amillar, May 30, 2004:
The following is a proposed solution, but does Grub (or others) actually use this algorithm? in reference to:
- One solution to this problem is using every computer connected to the Internet to crawl some Internet adresses (URLs) in the background. After downloading the pages, the new pages are compressed and sent back together with a status flag (changed, new, down, redirected) to the powerful central servers. The servers manage a large database and send out new URLs to be tested to all clients.
Unite both sections into one!
editI agree to join the subsection Parallelization Policy from the Web Crawler article into this Distributed Web Crawling article.
"It has been suggested that the section Parallelization policy from the article Web crawler be merged into this article or section."
Zoe, please do this for ease of reading and coherence.
relation to "Distributed Search Engine"
editDistributed search redirects to this page, but it's often not what people need, they may well be looking for Distributed search engine. Should there be cross-references, or a disambiguation page? --Avirr (talk) 16:49, 2 February 2011 (UTC)
Is Grub dead?
editThe implementation section talks about Grub and Looksmart, in the current tense. However, the relation to Looksmart is in the past tense. Additionally, I think Grub may even be a dead project. Docmphd (talk) 21:22, 26 January 2012 (UTC)