Wikipedia:WikiProject External links/Webcitebot2
The WebCiteBOT Replacement Task Force is a sub-group of Wikipedia:WikiProject External links. It is a coordinated effort to address the issue of citation templates having dead external links. The number of articles with dead tagged links went from under 10,000 to over 105,000 in 2010.
Overview
editWikipedia relies on verifiable information from reliable sources to ensure that the information it carries is accurate and presented from a neutral point of view. These are very basic needs of the project. Wikipedia uses external links to reliable information to accomplish this task. Unfortunately, the Internet is not stable and links come and go. So this issue needs to be addressed if Wikipedia is to have a succesful future.
WebCiteBOT was an incredible resource that monitored the external link feed of the IRC channel #wikipedia-en-spam. It searched for these links in citation templates and then submitted them to WebCitation.org and updated the templates. Unfortunately the bot died in November 2009 and dead links have been increasing ever since. (see graph above)
Goals
editThis is yet to be defined. At a minimum, some type of tool should be developed to assist archiving external links in citation templates.
- The bot or tool should be operated by at least 2 users so that when an operator leaves, we don't have this issue again.
- A goal of a functional equivalent of WebCiteBOT is probably worthwhile. This means an automated bot that monitors the external link feed of the IRC channel #wikipedia-en-spam and submits them to WebCitation.org and updates citation templates.
- Another related issue that was discussed over the last year was having the Wikimedia Foundation take over the role of WebCitation.org (for citation links) or more formally collaborate with / support webcitation.org to guarantee sustainability of the service.
Previous attempts
editThere have been several Wikipedians that have brought this issue before the greater Wikipedian community in order to have this problem addressed. A cash reward has even been offered by one user in order to get this issue fixed. Below is a list of the attempts in chronological order.
- 13 June 2010 - Wikipedia:Bot requests/Archive 37#We need another User:WebCiteBOT
- 24 August 2010 - WP:Village pump (policy)/Archive 78#WebCiteBOT is down - Dead links at record high
- 24 August 2010 - Wikipedia:Bot requests/Archive 37#WebCiteBOT still down, replacement growing more urgent
- 23 September 2010 - Wikipedia:Administrators' noticeboard/Archive216#Can we assemble a crack team to make a bot like User:WebCiteBOT operational?
- 27 September 2010 - Wikipedia:Bot requests/Archive 38#User:WebCiteBOT (loads slowly)
- 11 November 2010 - Wikipedia:Village pump (miscellaneous)/Archive 29#Using WP:Bounty Board to get a working WebCiteBOT
- 11 November 2010 - Wikipedia:Bounty board#A working WebCiteBOT
- 20 November 2010 - strategy:village pump#Wikimedia_Foundation_adoption_of_WebCiteBot_7046
Bots possibly in development
editA number of our software engineers have been working on the WebCiteBOT issue and may possibly have bots in the development stage. The current status of most of these bots is unknown at this time.
Bot | Operator | Status | Source code | Purpose |
---|---|---|---|---|
DASHBot 11 (maybe) | Tim1357 | Monitoring Process running | Unknown | External link archiving/tagging |
H3llBot (maybe) | H3llkn0wz | Inactive | No, relies on outdated framework | External link archiving/tagging |
NNBot II (maybe) | nn123645 | Unknown | Source code (zip) (PHP) unrelated BRFA | External link archiving |
Unknown | Δ | script running | Unknown | External link archiving |
Related bots
editSeveral bots have been developed over the years that have dealt with external link issues. Several of these are no longer operating, but they may provide examples that software engineers can use to help develop a new bot. Please note that some/most of the source code below is not free to use. Therefore, it can only be examined to give ideas for possible solutions to different tasks.
Bot | Operator | Status | Source code | Purpose |
---|---|---|---|---|
MerlLinkBot | Merlissimo | -Inactive | Unknown (Java) BRFA | Repairing dead weblinks |
DASHBot 11 | Tim1357 | -Inactive | Unknown (Python) BRFA | Dead link archiving |
DASHBot WebCite | Tim1357 | -Inactive | Unknown (Python) Wikipedia:Bots/Requests for approval/DASHBot WebCite | Check newly added URLs, archive URL at WebCite and add archive link to article |
DeadLinkBOT | ThaddeusB | -Inactive | Source code (Perl) BRFA | Update specific dead links |
EchoBot | Wikihermit | -Inactive | Unknown (Pywikipedia) BRFA | Finds and reports dead links |
H3llBot | H3llkn0wz | -Inactive | Not available (C# .NET) BRFA | Dead link archiving |
JabbaTheBot | I already forgot | -Inactive | Unknown (PHP) BRFA | Update stale links and remove linkspam |
Ocobot | Ocolon | -Inactive | Unknown (PHP, MySQL) BRFA | Finds dead links |
PhuzBot | Phuzion | -Inactive | Unknown (Python) BRFA | Find and report dead links (BRFA expired) |
Polbot | Quadell | -Inactive | Unknown (Perl) BRFA 8 | Repairs external links and references |
ShakingBot | ShakingSpirit | -Inactive | Unknown (Python) BRFA | Find and report dead links |
VixDaemon | KyraVixen | -Inactive | Unknown (Python) BRFA 4 | Finds external links |
WaybackBot | Tim1357 | -Inactive | Source code (Python) BRFA | Check Wayback Machine for pages (BRFA withdrawn) |
WebCiteBOT | ThaddeusB | -Inactive | Source code v1 Addons (Perl) BRFA 2 | Storing links in WebCite service |
ru:WebCite Archiver | ru:Vlsergey | -Active | Source code (Java, JWPF fork) | Reporting dead links, storing alive links in WebCite service, only for {{cite web}} template |
Bot resources
editBelow are some resources for users wanting to build bots that work with external link issues.
- AutoWikiBrowser semi-automated editor for Wikipedia
- Pywikipediabot a basic bot (written in Python) that can be expanded
- weblinkchecker.py script for Pywikipediabot that finds broken external links
WebCitation.org's position on bots
edit- WebCiteBOT's operator, ThaddeusB, was in contact with Gunther Eysenbach of WebCitation.org and said they were supportive of the project.
- I confirm that WebCite is very supportive. Ideally WebCite want to have the metadata associated with a reference handed over as well, so that statistics like "most cited author on wikipedia" can be calculated and displayed on a dashboard on the WebCite site. Please also hand over the "citing" article URL, e.g. in the refdoi field. WebCite is open source, so changes to the WebCite backend code (including a WebCitebot dashboard on the webcitation.org site) are also possible. To communication with WebCite, please use email only (I do not check my talk page on wikipedia), see http://www.webcitation.org/faq --Eysen (talk) 18:20, 9 February 2011 (UTC)
- Regarding the submission rate, nn123645 contacted WebCitation.org and they asked for an initial limitation of 1 submission every 5 seconds though that limitation would likely be removed later.
See also
edit- User:WebCiteBOT has detailed information about its operation
- Wikipedia:Checklinks, includes an archiving bot and web based tool for repairing links using both WebCite and Wayback Machine
- Category:Articles with dead external links
- Wikipedia:Link rot
- Wikipedia:External links