Wikipedia:Bots/Requests for approval/Null edit bot
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Withdrawn by operator.
Operator: — Martin (MSGJ · talk)
Automatic or Manually Assisted: Automatic and unsupervised
Programming Language(s): AutoWikiBrowser
Function Overview: To perform null edits on a list of pages (usually all pages in a particular category) in order to force their recategorisation when a populating template is changed, in cases when it is not desirable to wait up to one month (sometimes even more) for the job queue to do it.
Edit period(s): I intend to run it whenever I receive an appropriate request from an editor.
Already has a bot flag (Y/N): No
Function Details: I do a fair amount of template coding, and quite often templates are placed on pages in order to categorise them. When the template is edited, it can take a long time for the pages to change category: depending on the size of the job queue it has been known to take up to two months for every article to move across. This can be undesirable for several reasons. It results in a long period of flux in which two categories are partially populated, which can cause confusion. It often means that by the time the category is empty, the editor who changed the template has long forgotten about it and does not go back and redirect or delete the now empty category. In other words, the operation of cleaning up after a change is less likely to be as thorough.
I do not intend to run this bot on very large categories (say, categories with more than a few thousand members). Waiting for the job queue to complete at off-peak times is more appropriate for these major changes. However forcing the recategorisation will be appropriate in smaller categories, when a job and tidy-up operation can be completed in a relatively short time.
I believe I can successfully use AWB to perform this task. Previously I have used the tool in a semi-automated manner by appending a line space to end of the page. The software automatically takes off a line space; thus a null edit is performed. I would expect that Special:Contributions/Null edit bot would stay empty.
I will set up a task list, invite any editors to place requests and prioritise them according to my judgement.
Discussion
editI'm guessing that its not intentional for such changes to take a month. The job queue length is currently only about 20,000. For it to take a month would mean that each job would have to take 2 minutes to run. Over the weekend I plan to do some investigating into the job queue to make sure that its still populating the queue correctly and that the jobs are run correctly. Mr.Z-man 04:41, 16 April 2009 (UTC)[reply]
- Hmm, so far I haven't been able to reproduce any issues. There's still a couple more things I want to test though. Mr.Z-man 16:40, 17 April 2009 (UTC)[reply]
- I did some testing with my own wiki, and haven't been able to replicate any of the issues described here. When editing a template, the job queue was properly populated and the script used by Wikimedia to run the jobs as well as the jobs themselves seemed to be working correctly. The only 2 possible issues I can see are:
- The backlink cache is never forcibly cleared. However, it has an expiry time of an hour, so this should only affect pages where the template is added within an hour of an edit to the template, so it shouldn't affect the majority of pages, and should be fixed if the template is edited again after the cache expires.
- If the category is dependant on some parser function that doesn't return true when the job is run, it won't be updated.
- -- Mr.Z-man 04:30, 19 April 2009 (UTC)[reply]
- My thought is that the job queue exists for a reason, and before we approve a bot to hack around the job queue, it might be nice to ask the developers if they object or if this can be implemented in the queue timer more efficiently. MBisanz talk 05:09, 19 April 2009 (UTC)[reply]
- Yes, the job queue exists for a reason. What I have tried to assert above however, is that in some circumstances it would be beneficial to allow the template programmer to decide to bypass the queue if (s)he thinks there is a significant advantage of finishing the job more quickly. A few hundred null edits is not going to crash the server, and it would allow the editor to empty a category, redirect it and move on to their next task rather than having to wait a month and then remember to come back and finish the job. I could check with a developer if it would make you happier about the performance issue. — Martin (MSGJ · talk) 14:51, 20 April 2009 (UTC)[reply]
- I agree that, if this is considered desirable, it should be done at the MediaWiki level. I am sure this is not the most efficient way of dealing with this issue. Note also that there has been significant development on the job queue recently -- I believe it is significantly more effective than it was a few months ago. [[Sam Korn]] (smoddy) 15:13, 20 April 2009 (UTC)[reply]
- I don't really understand the concerns here. Have you read WP:PERFORMANCE? If something makes a job easier for editors then that takes priority over server performance concerns. — Martin (MSGJ · talk) 15:57, 20 April 2009 (UTC)[reply]
- <Duesentrieb> OverlordQ: there may be times when this is desied, but endorsing it kind of obsoletes the job queue. if this is done a lot, it would pose massive load on the serve3r4s Have you read the bots policy? does not consume resources unnecessarily Q T C 16:14, 20 April 2009 (UTC)[reply]
- A quote from that page:
- As a technical matter, it's our responsibility to keep the system running well enough for what the sites require (Brion Vibber)
- I.e. if there is a problem with the job queue taking too long to complete, it is the developers'/sysadmins' jobs to fix it. This is an ugly hack that has the potential to cause significant server load. If it is desirable, as I say, it should be done properly, by the MediaWiki developers. [[Sam Korn]] (smoddy) 16:18, 20 April 2009 (UTC)[reply]
- I don't really understand the concerns here. Have you read WP:PERFORMANCE? If something makes a job easier for editors then that takes priority over server performance concerns. — Martin (MSGJ · talk) 15:57, 20 April 2009 (UTC)[reply]
- Reply to Z-man. Thanks for looking into this in detail. The two months I quoted was an extreme maximum, and indeed a lot of the articles make the transit quite quickly. However, please believe me that it can, and quite often does, take several weeks for all the articles to have moved categories. — Martin (MSGJ · talk) 14:47, 20 April 2009 (UTC)[reply]
- As I said, if it takes a month or more, that's more likely a sign that something is broken than an intentional design, in which case we should be trying to fix it, not hacking around it. Note that the job queue was broken up until a few months ago, have there been any issues since the beginning of March? (according to the server admin log, it looks like it was fixed around mid-February) Looking at Special:Statistics now, its down to <700, so in 4 days since my first comment here, its worked through all 20,000 that I mentioned previously. Mr.Z-man 16:30, 20 April 2009 (UTC)[reply]
- Well, funnily enough, yesterday I made a stupid mistake with a template and it emptied about 10 thousand articles in a matter of hours :) So maybe you are right ... but this is far from typical in my experience. — Martin (MSGJ · talk) 16:33, 20 April 2009 (UTC)[reply]
- As I said, if it takes a month or more, that's more likely a sign that something is broken than an intentional design, in which case we should be trying to fix it, not hacking around it. Note that the job queue was broken up until a few months ago, have there been any issues since the beginning of March? (according to the server admin log, it looks like it was fixed around mid-February) Looking at Special:Statistics now, its down to <700, so in 4 days since my first comment here, its worked through all 20,000 that I mentioned previously. Mr.Z-man 16:30, 20 April 2009 (UTC)[reply]
- My thought is that the job queue exists for a reason, and before we approve a bot to hack around the job queue, it might be nice to ask the developers if they object or if this can be implemented in the queue timer more efficiently. MBisanz talk 05:09, 19 April 2009 (UTC)[reply]
Ah. Quite surprising to see such requests here.
FYI, since the Job Queue got implemented, the null edit script got deleted from the pywikipedia framework. There is a reason behing this. When edits are queued to be done, it's simply useless to do them a few hours before.
A few cases:
- The page you're about to null edit has little views over time. (as in there are no visits during the time it takes to process the page in the job queue). Taking a few hours to be updated is not going to change anything: no one is going to see it in the meantime anyway.
- The page you're going to null edit is frequently viewed: during the time it takes for the Job Queue to process the page, some readers or editors are actually going to see a slightly, or completely broken page. Splitting again:
- Editors see the the issue and try to fix it. They edit the section/purge the article. Problem's gone.
- The borked thing is seen but people don't know how to fix, or are not willing to fix it. Problem stays.
For me it's quite simple. The more the article is viewed, the more it has chances to be fixed in the mean-time. If it's not viewed a lot, chances that people are going to see the broken thing are low.
There's really no hurry in fixing a mistake when you know that it is going to be fixed for sure. Wikipedia is an encyclopedia project, yada, yada, no hurry. No need to fix what's not broken.
But you have the right to disagree with the previous statement. If so, please think twice about the technical manipulations that you're doing:
- The job Queue is crowded
- As a consequence it takes a significant time to process articles. But I want articles to be processed faster.
- Solution: in the meantime, I do null edits to invalidate the cache myself to "fix" the articles beforehand
- Two direct consequences:
- The articles were updated beforehand, meaning that part of what's queued in the JobQueue is useless: some jobs will just wait to be processed, and when finally its their turn... well, too bad, a bot did the job itself. It means that some articles are queued for nothing.
- More important, because I nulledit some articles myself, you recursively queue some more "jobs". Not directly on the JobQueue tho: I don't think that you can do this only null-editing. But because I null edit articles, I invalidate the Parser cache. Meaning that all pages transcluding the page you null-edit will be invalidated and will have to be updated again on the next inclusion. I virtually add some load on the system, meaning... that it will delay treatment of items of the JobQueue. (Think about it, the JobQueue is meant to work when system load is low: "I queue complex queries for later, when I'll have time to do it". If Parser has to update again pages, load goes up, jobs are treated later) Now, I see you coming, you're going to say that even without null-edits, the system itself is going to invalidate those pages, and ask the Parser to work. YES, you're absolutely right. But order does matter.
- And as a result JobQueue gets longer, and it takes longer to process each item. Oh, wait. I thought my aim was to reduce processing time?
I think that there is no need to try to bypass the system to make things go "faster": you're working from a higher level, and should not, by design, attempt to do this. I you have issues with the current JobQueue, open bug requests, complain noisily to developers and sysadmins. But don't try to "fix it" from a high level. If you have some coding skills, please have a look at the MediaWiki code, try to come up with improvments, but such bots do no good.
I would strongly recommend not to accept such practices.
NicDumZ ~ 17:18, 20 April 2009 (UTC)[reply]
- Thanks for the comments everyone. I am surprised that null edits could be so controversial :) But I accept the reasoning here and will withdraw my request. — Martin (MSGJ · talk) 17:34, 20 April 2009 (UTC)[reply]
Withdrawn by operator. – Quadell (talk) 17:51, 20 April 2009 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.