Wikipedia:Bots/Requests for approval/Cleanbot
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Denied.
Operator: Lightmouse (talk)
Automatic or Manually Assisted: Automatic
Programming Language(s): AWB
Function Summary: Delink dates except for solitary years.
Edit period(s) (e.g. Continuous, daily, one time run): Continuous
Already has a bot flag (Y/N): No.
Function Details:
- Its primary target is 'autoformatted dates', for example "[[19 October]] [[2008]]" will become "19 October 2008".
- Secondary targets include errors such as "[[19 October|October 19]]" and date elements/combinations that are not solitary years.
- It will not delink solitary years. For example "happened in [[2008]] and ... " will be unchanged.
- It will not delink anything that contains a non-date term that is visible to the reader. For example "[[April 2005 lunar eclipse]]" will be unchanged.
Guidance at wp:mosnum says: The linking of dates purely for the purpose of autoformatting is now deprecated..
Guidance at wp:context also deprecates such links.
Featured Articles, Featured lists, Good Articles, Peer review, and wikiprojects are implementing the guidelines. I suspect that the bots that were adding links to date elements have stopped and many editors are removing date links. Some people are not yet comfortable with a bot delinking solitary years, therefore solitary years are specifically excluded from this bot request.
The code already exists and has been well tested on manual edits.
Discussion
edit- Is linking to a specific day of the year like in your second example always an error? --Carnildo (talk) 02:02, 20 October 2008 (UTC)[reply]
You may be aware that delinking is already taking place on quite a large scale by multiple editors using manual methods. As far as I am aware, the suggestion that you make has not been an issue. There was a discussion about a day+month link being valid because it is a significant annual event but the response seemed to be that annual events should link to the relevant article (e.g. Guy Fawkes Night) rather than the date. Lightmouse (talk) 10:51, 20 October 2008 (UTC)[reply]
- In the first line of Guy Fawkes Night, there's a link to November 5. Do you consider this an error? --Carnildo (talk) 21:58, 20 October 2008 (UTC)[reply]
- I am not sure. What do you think? Lightmouse (talk) 22:01, 20 October 2008 (UTC)[reply]
- I think it's fine, and that your bot shouldn't automatically remove standalone day+month links. --Carnildo (talk) 23:29, 20 October 2008 (UTC)[reply]
- the Guy Fawkes Night article does a fine job of clarifying what date it's observed on and why, and a link to a list of other things that have happened on November 5th throughout history contributes nothing (0) to anyone's understanding of the article. i strongly support a bot unlinking this kind of thing along with full dates. Sssoul (talk) 15:25, 21 October 2008 (UTC)[reply]
- Only a tiny fraction of articles describe annual anniversaries and it is relatively easy avoid those. I note you used the phrase "standalone day+month links". Are you implying that you would support a bot that removes day+month+year links? Lightmouse (talk) 08:49, 21 October 2008 (UTC)[reply]
- I don't care either way on full day+month+year links. --Carnildo (talk) 20:01, 21 October 2008 (UTC)[reply]
- Only a tiny fraction of articles describe annual anniversaries and it is relatively easy avoid those. I note you used the phrase "standalone day+month links". Are you implying that you would support a bot that removes day+month+year links? Lightmouse (talk) 08:49, 21 October 2008 (UTC)[reply]
- I'm all for this task, but why aren't you delinking solitary year links as well? I don't see any added value with them either. And can you make the source code for this bot available? Parsing out date syntax is a bit trickier than you might realize. There are all sorts of edge cases that are valid syntax that require special handling. For instance, here is only a partial implementation in JavaScript. The regular expression approach proved to be too limiting, so you'll really need to use a full-on grammar parser. In particular, the linked script does not handle properly delinking dates that are followed by a word beginning with the same letter as the name of any month, because it only uses single character look-ahead. --Cyde Weys 15:07, 20 October 2008 (UTC)[reply]
Thanks for your support. I agree with you that solitary years are still a problem. I am not delinking solitary years because Shereth said that he/she would block me if I did. See http://en.wikipedia.org/w/index.php?title=User_talk:Lightmouse&oldid=244095391#Bot_stopped
I am aware of the problems of date syntax. In fact, I have improved on User:Cyde/monobook.js/dates.js, you might want to replace that with the vastly superior (I think) User:Lightmouse/monobook.js/script.js. I also have several variants of AWB code that can be made available to you. Feel free to contact me at my talk page. Lightmouse (talk) 21:17, 20 October 2008 (UTC)[reply]
- Don't be so dramatic, Lightmouse. I never said I would block you. What I did say is that I would block the bot if it resumed delinking them without consensus. MOSNUM does not mention solitary year links. I understand that, in your point of view, they are low-value links (and I am inclined to agree), however it is a perennial issue where editors are complaining on various notice boards about bots unlinking these years without any kind of mandate to do so. Let me make it clear - I do not oppose de-linking years using a bot, but I will take action to prevent its operation until such a mandate has been established. Given the scope and recurring nature of the complaints, the consensus of a small group of editors (such as those watching MOSNUM or the BAG) is insufficient to demonstrate any kind of mandate. For what it is worth, I support the above proposed bot as-is. Shereth 13:38, 21 October 2008 (UTC)[reply]
Thanks. Lightmouse (talk) 13:43, 21 October 2008 (UTC)[reply]
- Most solitary year links are low-value, or no-value. Some are valuable, and there is disagreement about which ones are valuable. (The same applies to solitary date links, with Carnildo's example above being typical of the rare exceptions.) Under such circumstances, bots should not delink either of these; a bot-assisted human, who used the bots to construct edits, and then looked at them and tweaked as necessary, would be very serviceable.
- If this is approved for day-month-year dates, it should not run continuously; it should run once, establishing a baseline of no linked full dates, and then stop. Otherwise human editors cannot override its judgment; if they try to, it will edit-war with them every time they fix it. Septentrionalis PMAnderson 15:28, 21 October 2008 (UTC)[reply]
- Suggest that the bot should not delink dates of birth and dates of death. Recent (still open) RFC shows no consensus not to link such dates. Jheald (talk) 14:09, 21 October 2008 (UTC)[reply]
That rfc sought consensus for linking birth/death dates. The rfc failed by 18 oppose and 17 support. That doesn't look to me like consensus for special treatment. However, to respond at the purely technical level, I can't see how a bot can distinguish the purpose of a date link. If birth/death dates are always in a particular format, it might be technically possible (e.g. check if the 5 preceding characters constitute the word: 'born' followed by a space), desirability of such a solution is another matter. Note that it will not delink any birth/death date that is a solitary year. Lightmouse (talk) 14:57, 21 October 2008 (UTC)[reply]
- This also means that there is no consensus for massively removing such links. Septentrionalis PMAnderson 15:28, 21 October 2008 (UTC)[reply]
- Strong Oppose I strongly oppose this bot's actions. The manual of style should never have been amended in the first place to deprecate the wikilinking of full dates. For starters, this deprecation will only inspire more edit wars between the British versus American date formatting standards, because editors will no longer be able to see their own preferences. Additionally, I actually like having dates wikilinked for other reasons. For example, with particular attention to articles on historical subjects, having the full date wikilinked provided additional emphasis to it and made it stand out, which greatly aided the reading of the article. Having seen some of the articles with dates unwikilinked, the dates just seem to fade away into the article text and it's more difficult to pull them out. Dr. Cash (talk) 22:30, 21 October 2008 (UTC)[reply]
- Ah, can you point to this rash of edit wars between British and American editors? WP has matured enough that this is no longer the issue it was in 2003, when the date autoformatting was foolishly adopted. DA has been removed from many many thousands of articles (weighted towards the prominent and much visited ones) over the past months, and those who complain are restricted to a tiny, vocal group of WPians. The reactions of normal editors range from highly enthusiastic ("about time", etc) to not caring or having thought about it. It's an opportune time for us all to concentrate on the readers, who gain no benefit from the autoformatting whatsoever (they're not registered, logged in and preferenced), yet experience the significant dilution-through-bright-blue-speckling of the high-value links in the vicinity of dates. The sooner this cancer is cleaned out of WP, the better for the project. It was a mistake, and we should all admit it now. Tony (talk) 03:11, 22 October 2008 (UTC)[reply]
How would the bot deal with links such as [[Independence Day (United States)|July 4th]]? Does it have an algorithm to detect that the article linked to is not a date article, or does it have a list of known anniversaries that should be left alone? --Gerry Ashton (talk) 05:30, 22 October 2008 (UTC)[reply]
Proposal modification
editA template or HTML comment shall be created, perhaps called {{NoDateBots}}. Every bot that processes dates shall be required, upon pain of blocking, to recognize this template and not automatically process any page containing this template. Editors may place this template in pages that have unusual date syntax that tends to be misprocessed by bots. --Gerry Ashton (talk) 16:16, 21 October 2008 (UTC)[reply]
- This seems reasonable to me and a good way to reduce tensions over this issue. This proposal doesn't come across as too "pro-linking" to me; it just requires us to stop and talk about it if there's something special going on in a particular page; we can always remove the "no datebots" tag on a page if it doesn't seem warranted. The "upon pain of blocking" is a bit dramatic for my taste, but I get the idea. - Dan Dank55 (send/receive) 16:24, 21 October 2008 (UTC)[reply]
This does not sound like a 'proposal modification', it sounds like a 'proposal for a template'. I have no objection to people making other proposals but please can we confine this page to discussion about the bot that will:
- Delink dates except for solitary years.
Lightmouse (talk) 16:32, 21 October 2008 (UTC)[reply]
OPPOSE. All bots encounter situations they are too stupid to deal properly with. There is nothing stated in the proposal to prevent this bot from engaging in an edit war where the bot makes a mistake, an editor fixes it, the bot reprocesses the article and makes the same mistake again, and so on forever. --Gerry Ashton (talk) 16:44, 21 October 2008 (UTC)[reply]
- Can you give an example page where it would be a mistake? Lightmouse (talk) 16:49, 21 October 2008 (UTC)[reply]
- Of course I can't provide an example, since the Cleanbot does not yet exist. However, a look at the history of Lightmouse's talk page (and archives) shows numerous exaples of LightBot and Lightmouse's use of AutoWikiBot having numerous unintended consequences, and I don't expect the new bot to be any different. Lightmouse's approach to date has been to ask editors to report problems, and try to fix the bot to not make that mistake again. I believe this entire approach is wrong. Once an bot makes a mistake in an article, it should NEVER get another chance to mess up that article.--Gerry Ashton (talk) 17:01, 21 October 2008 (UTC)[reply]
- Although the example of 5 November in Guy Fawkes' Night should serve; unless Lightmouse is amending the request. If so, he should say so. Septentrionalis PMAnderson 17:20, 21 October 2008 (UTC)[reply]
- Of course I can't provide an example, since the Cleanbot does not yet exist. However, a look at the history of Lightmouse's talk page (and archives) shows numerous exaples of LightBot and Lightmouse's use of AutoWikiBot having numerous unintended consequences, and I don't expect the new bot to be any different. Lightmouse's approach to date has been to ask editors to report problems, and try to fix the bot to not make that mistake again. I believe this entire approach is wrong. Once an bot makes a mistake in an article, it should NEVER get another chance to mess up that article.--Gerry Ashton (talk) 17:01, 21 October 2008 (UTC)[reply]
- Well, lets all be clear, all bots make errors. If there are 100 events and the error rate is 1 in 1,000, you might never see the error. Unfortunately, Wikipedia has tens of millions of dates to be delinked so even a 1 in 100,000 error rate will result in errors becoming visible. It just depends whether people want 99,999 good things in exchange for 1 bad thing. If you know of any bot author claiming a zero error rate, let us know. An approach that involves updating bot code after first use is a good thing to be proud of, not a bad thing to be ashamed of.
- It is much easier if the rules are simple, many of the problems arise when people ask for extra constraints/exceptions. Some of the simpler ways of avoiding false positives involve pre-filtration of articles. For example, we could avoid:
- articles that contain a date related word in the title (e.g. 'Day', 'Night', 'Week', 'Month' '2008 in ...' etc) i.e. includes 'Guy Fawkes Night'
- articles that contain the word 'calendar'
- articles that are in the categories 'Anniversaries' and 'Observances' e.g. includes 'Guy Fawkes Night'
- articles that are on a whitelist (to be defined but could include 'Guy Fawkes Night')
- If I understand your comments so far, that seems to address them. N'est-ce pas? Lightmouse (talk) 17:31, 21 October 2008 (UTC)[reply]
- Absolutely not. Allowing the bot to trample pages and then require editors to revert the bot and whitelist the article is in direct violation of "is harmless". BJTalk 17:35, 21 October 2008 (UTC)[reply]
Are you suggesting that the other bots have a zero error rate or are you using a different definition of 'is harmless' for this bot? Lightmouse (talk) 17:57, 21 October 2008 (UTC)[reply]
- My attitude is that all bots should obey one of two rules:
- The bot is only allowed one pass through the articles OR
- There is a mechanism to make the bot skip any article forever if an editor notices the bot make an error on its first attempt to process the article.
- If this is a new requirement, so be it. --Gerry Ashton (talk) 18:06, 21 October 2008 (UTC)[reply]
- No, if the bot makes an error it should be fixed. BJTalk 18:13, 21 October 2008 (UTC)[reply]
- Bots cannot always be fixed. If bots were smart enough to always get things right, they could write the articles for us. Of course, any errors found should be evaluated to determine whether they can be fixed, and if not, whether the error rate is low enough. --Gerry Ashton (talk) 19:34, 21 October 2008 (UTC)[reply]
- No, if the bot makes an error it should be fixed. BJTalk 18:13, 21 October 2008 (UTC)[reply]
- Most bot proposals have a hypothetical error rate of zero, yes. BJTalk 18:13, 21 October 2008 (UTC)[reply]
This bot has a hypothetical error rate of zero. This bot can pass through articles only once. Lightmouse (talk) 18:20, 21 October 2008 (UTC)[reply]
- The proposal says:
- Edit period(s) (e.g. Continuous, daily, one time run): Continuous
- If in fact this bot will have a mechanism to prevent it from visiting an article more than once, that mechanism should be explained. I interpret Lightmouse's statement that "This bot can pass through articles only once" to mean that if it processes, let's say, Gregorian calendar on 5 December 2008, it will never again process the "Gregorian calendar" article, not even on 5 December 2012. Is this interpretation correct? --Gerry Ashton (talk) 18:36, 21 October 2008 (UTC)[reply]
Simple question but the answer isn't so simple. For a start, it will examine 'Gregorian calendar' but it will discover the term 'calendar' and abandon any further processing without making any edits. But that is a minor point.
- Technically, the way to reduce second processing is to create a list of articles and then cross-check it against the list of previous contributions.
- Most bots only have a few hundred or a few thousand in a list but the massive scope of date links (all of Wikipedia) means we would have to check one list of up to 2.5 million against another list of up to 2.5 million. The biggest list that I can handle is 25,000. So that is not a practical option.
- Therefore, the only method that I am aware of is to use the alphabetical list of Wikipedia articles. We could start at the letter A and take the first group of 25,000 articles and process them. We would then take the next group of 25,000 articles and so on for 100 groups (i.e. 2.5 million articles) until we got to the letter Z. So the rate of once-only processing would approach 100% but it would never achieve 100% because the alphabetical list includes redirects. For example, the article 'Prime Minister Tony Blair' redirects to 'Tony Blair'. When it gets to the end of the alphabetical list, its job would be over, perhaps its 'Edit period' is not 'continuous' and should be described as 'one-time run'. I hope that clarifies things a bit more. Lightmouse (talk) 19:04, 21 October 2008 (UTC)[reply]
- I would urge Lightmouse to investigate wheter there is any mechanism to determine that an article title is actually a redirect. I think it would be a good idea to avoid processing redirects. My intuition is that the kind of articles that have many redirects are just the sort of articles that are apt to have tricky date syntax. --Gerry Ashton (talk) 19:30, 21 October 2008 (UTC)[reply]
I have just been told how to avoid being redirected. Frankly, I don't share your pessimism about redirects resulting in a bad edit (although that might depend on the definition of 'bad'). It would help if we could discuss a specific example but now that I know the method, I can do it to gain your support. Unfortunately, positive response from swing voters such as yourself will not be sufficient to get this bot elected to approved status in the face of the other negative responses above. Lightmouse (talk) 19:57, 21 October 2008 (UTC)[reply]
- There is a better workaround. I can easily write a custom list provider for AWB, so it wont load any redirects into the list at the time of making. Obviously, this wouldnt account for any that changed in the meantime, but seeing as it can skip them if it finds them, that reduces duplication further.
- Snippit from API:
* list=allpages (ap) * Enumerate all pages sequentially in a given namespace Parameters: <snip> apfilterredir - Which pages to list. One value: all, redirects, nonredirects Default: all
- Reedy, thanks. I will take you up on that after approval. Lightmouse (talk) 12:19, 22 October 2008 (UTC)[reply]
I oppose this proposal for, among other things, the reasons expressed by Gerry Ashton. Tennis expert (talk) 21:43, 21 October 2008 (UTC)[reply]
Since when does "deprecated" mean "removed ASAP?" This is something that should be put into AWB general fixes, not made into a bot to make God-knows-how-many edits just to do this. Mr.Z-man 23:00, 21 October 2008 (UTC)[reply]
- Gerry, you made an assertion on the basis of what CleanBot is designed to do. The bot doesn't have to exist for you to find examples to support your assertion. I'm keen to see them. Tony (talk) 04:30, 22 October 2008 (UTC)[reply]
- I didn't exactly object on the basis of what CleanBot is designed to do; I am concerned both that there may be linked items out there that look like dates, but are not, and also that the coding of the bot may not exactly carry out the design. In either case, it should only get one crack at an article; editors should not have to keep fixing the same article over and over again while any kind of error is fixed in the bot.
- As for a specific example, it would be interesting to see what it makes of [[March 25]], [[1 BC]] in the Sextus Julius Africanus article. --Gerry Ashton (talk) 05:19, 22 October 2008 (UTC)[reply]
- BRFA is not the place for debating if the links should be removed. If the removal of date formatting is still contested, consensus needs to be formed before a bot approval can be started. BJTalk 08:13, 22 October 2008 (UTC)[reply]
- Comment: Why won't solitary years be delinked? Matthewedwards (talk • contribs • email) 07:03, 22 October 2008 (UTC)[reply]
Also, what's the difference between this bot and Lightbot? Matthewedwards (talk • contribs • email) 07:09, 22 October 2008 (UTC)[reply]- Lightbot is not approved to remove date formatting. BJTalk 08:13, 22 October 2008 (UTC)[reply]
- Um ... that's news to me, unless you count Tennis expert, who's ruffled quite a few of his tennis-project colleagues over the issue. The removal of date autoformatting
appears to beis widely and enthusiastically supported by the community. Tony (talk) 12:52, 22 October 2008 (UTC)[reply]- I haven't heard anything about it, could you point me to this wide support (anything on a subpage of MOS doesn't count)? BJTalk 13:00, 22 October 2008 (UTC)[reply]
- Um ... that's news to me, unless you count Tennis expert, who's ruffled quite a few of his tennis-project colleagues over the issue. The removal of date autoformatting
- Lightbot is not approved to remove date formatting. BJTalk 08:13, 22 October 2008 (UTC)[reply]
- Is this your own rule—that wide support at our central pages for style and formatting suddenly counts for nought ... While not accepting this premise, we might start with the initial immediate responses by editors at article talk pages, in very early days before the change had been promulgated. It is not a fair representation because there seemed little benefit in continuing to register after the trial period (the comments were gathered during only a few weeks in August). There are many many more positive comments, and the circumstantial evidence that editors out there are accepting of the removal of date autoformatting where they do not actively voice their approval lies in their silence, despite the removal of DA from many thousands of articles (many of them key ones). Tony (talk) 14:13, 22 October 2008 (UTC)[reply]
- You said it had wide community support but I'm still not seeing any evidence of that. For example, when bot matters are discussed on Wikipedia:BOTS a proposal may gain support. Only when brought to the attention of the entire community can it have wide community support, as was done with the adminbots RfC. Local consensus and favorable talk page comments are not sufficient for a bot that is going to make hundreds of thousands of edits. BJTalk 14:38, 22 October 2008 (UTC)[reply]
- Again, you seem to be creating your own rules for what is "community support". You asked for evidence of wide support for the removal of date autoformatting, not of anything to do with bots. That's what I linked you to as an example. This application process is presumably the place where support for the running of a bot is assessed. We seem to be confusing things here. If you don't like what the style guides say about it, please raise the matter at the talk pages (MOSNUM, CONTEXT, MOSLINK, et al.). Tony (talk) 14:47, 22 October 2008 (UTC)[reply]
- How could it have support when the community was never asked to comment? Where is the village pump proposal to remove all auto-formatting? There has been nothing on {{cent}}, the noticeboards, or watchlist notices. The last thing I can find on {{cent}} is "Proposal to discourage the auto-formatting of dates". There needs to be a widely advertised VP proposal for the removal of auto-formatting in all articles and the exceptions. If the community shows support then a bot request should be filed. BJTalk 15:08, 22 October 2008 (UTC)[reply]
- Perhaps it's frustrating to have missed out on the loooong debate that occurred over some two years (intermittently, but more intensively and continuously during 2008 until the change in August). I'm sorry that you were not aware of it (it was certainly promulgated at VP and elsewhere at the time), but the fact that you missed it does not mean that there was not thorough and extensive debate in many locations. I must ask you to resist the temptation to seek to place retrospective caveats on the community's decision because you were not there at the time. Don't worry, there certainly were naysayers to represent your views, but they were very much in the minority, and still are. This is not the place to discuss that decision; this is to discuss a bot application. I do not want to spend more time debating this at the wrong location at the wrong time. Sorry. Tony (talk) 15:18, 22 October 2008 (UTC)[reply]
- You seem to be confusing consensus to change the MoS with consensus to then apply that change to every article. It seems the community was informed for the former but not the latter. BJTalk 15:36, 22 October 2008 (UTC)[reply]
- Just look at Featured Articles, Good Articles, Peer Review etc. and other popular representations of the best that Wikipedia has to offer. A statistical analysis of the ratio of (date links)/(dates that could be linked) would be interesting. I bet it would be a tiny fraction of a percent. Lightmouse (talk) 13:16, 22 October 2008 (UTC)[reply]
- Oh yeah. My first question still stands. I see nowhere in MOSNUM or MOS that stand-alone years should be linked. Matthewedwards (talk • contribs • email) 09:17, 22 October 2008 (UTC)[reply]
Matthewedwards, I agree with you that there is massive overlinking of solitary years. Unfortunately, a few editors oppose delinking of solitary years. This proposal is an attempt to work within their constraints by delinking all dates except solitary years. Lightmouse (talk) 10:13, 22 October 2008 (UTC)[reply]
- Gerry Ashton mentions two things:
- "one crack at an article" - yes. See extensive discussions above.
- what it makes of [[March 25]], [[1 BC]] - it will delink it. That is exactly what a human would do. The bot proposal is Delink dates except for solitary years..
- If you think that particular date needs an exemption, can you be more specific? Trying to help. Lightmouse (talk) 11:13, 22 October 2008 (UTC)[reply]
Such a minor and utterly inconsequential change should not be given approval. It will result in what can only be called "pointless edits". A much better solution is to gather consensus for, and write a patch for AWB that adds this change to its general fixes, so that other bot tasks which are actually doing something useful can fix it. (note: Lightmouse's AWB access has been revoked, before I saw this thread, for the pointless edits. This is the same way as I'd revoke the access of someone who was going through doing solely general fixes, and I believe that this bot request such be treated in the same way as any other which aims to do just gen. fixes). Martinp23 15:22, 22 October 2008 (UTC)[reply]
- (quite frankly, looking at the debate on this page, I can't honestly believe that the consensus with the MOS/appropriate RfCs is mature enough for a bot to be acting on it at all, through general fixes or otherwise). Martinp23 15:31, 22 October 2008 (UTC)[reply]
- Agreed, they are insignificant edits. I don't think a bot can be written that is intelligent enough to ignore dates that are appropriately linked. –xeno (talk) 15:42, 22 October 2008 (UTC)[reply]
- Denied. As evidenced by this discussion, this is still far too controversial a task for a bot, there is no consensus outside of the MoS talk pages that date autoformatting should actively be removed rather than simply discouraged, and these edits are too minor for a bot to be worth the effort. Mr.Z-man 16:10, 22 October 2008 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.