Wikipedia:Bots/Requests for approval/NKbot 2
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Denied.
Operator: Nakon
Automatic or Manually assisted: Automatic, supervised.
Programming language(s): PHP
Function overview: Deletes pages in CAT:TEMP that have not been edited in over 30 days.
Edit period(s): On an as-needed basis.
Estimated number of pages affected: The category currently contains over 25k pages.
Already has a bot flag (Y/N): N
Function details: Duplicates the functionality of the inactive bot User:CAT:TEMP deletion bot. This bot will require an +admin flag. The page list is manually generated through AWB's list comparer. I generate the list by pulling the pages in CAT:TEMP and I then remove any pages that are also in Category:Suspected Wikipedia sockpuppets. The bot goes down the final list and deletes pages that meet the date criteria.
Source is available at User:NKbot/source and the initial delete list is at User:NKbot/delete.
Discussion
editI took a look at the source you posted, and I have a few comments:
- Any particular reason it shells out to wget rather than using PHP's curl module?
- There is no error handling in checkValidDate. So if it gets an HTTP error, or the API returns an error, or the response is truncated, or for any other reason the result isn't as expected then the test will incorrectly pass.
- I ignored the similar lack of error checking elsewhere, because in those cases the failure results in the bot being unable to complete the deletion.
- If the list contains a run of pages that do not pass checkValidDate, the bot will hammer the server as fast as possible (without even being logged in, since you skip passing the cookies for that query). Consider using maxlag=5, or delaying after each read.
- Is this really urgent enough that it needs to perform a deletion every 4 seconds?
- I note that the "Deletion Reason" must be urlencoded before being passed to the program. Not a problem per se, but could cause trouble if you ever forget to do that. Same for the config settings, but that's even less of a potential issue because that shouldn't have to be changed often.
- Any particular reason it doesn't use the API's action=delete to do the actual deletion?
- Any particular reason it doesn't use the API's action=login to do the login?
Hope this helps. Also, I note Wikipedia:Bots/Requests for approval/CatempBot requesting to do this same task was recently withdrawn for unspecified reasons. Anomie⚔ 19:29, 13 June 2009 (UTC)[reply]
- Any particular reason it shells out to wget rather than using PHP's curl module?
- No reason in particular. I'm just more familiar with wget. Nakon 21:53, 13 June 2009 (UTC)[reply]
- There is no error handling in checkValidDate. So if it gets an HTTP error, or the API returns an error, or the response is truncated, or for any other reason the result isn't as expected then the test will incorrectly pass.
- I ignored the similar lack of error checking elsewhere, because in those cases the failure results in the bot being unable to complete the deletion.
- I've added a check to see if either variable is incorrect. Nakon 21:53, 13 June 2009 (UTC)[reply]
- If the list contains a run of pages that do not pass checkValidDate, the bot will hammer the server as fast as possible (without even being logged in, since you skip passing the cookies for that query). Consider using maxlag=5, or delaying after each read.
- I've moved the sleep timer to a better location. Nakon 21:53, 13 June 2009 (UTC)[reply]
- Is this really urgent enough that it needs to perform a deletion every 4 seconds?
- The 4 second delay is just a number I used during testing and can be increased as needed. Nakon 21:53, 13 June 2009 (UTC)[reply]
- I note that the "Deletion Reason" must be urlencoded before being passed to the program. Not a problem per se, but could cause trouble if you ever forget to do that. Same for the config settings, but that's even less of a potential issue because that shouldn't have to be changed often.
- I've hardcoded the deletion reason into the script. Nakon 21:53, 13 June 2009 (UTC)[reply]
- Any particular reason it doesn't use the API's action=delete to do the actual deletion?
- When I first wrote the code a few years ago, it was not possible to do this with the API. I've updated the script accordingly. Nakon 21:53, 13 June 2009 (UTC)[reply]
- Any particular reason it doesn't use the API's action=login to do the login?
- Per above. Nakon 21:53, 13 June 2009 (UTC)[reply]
- Looks good, except that the $reason, $article, and probably $delt2 need urlencoding in the deletion query. Anomie⚔ 22:06, 13 June 2009 (UTC)[reply]
- I disabled the CAT:TEMP bot because too many people started to complain. I think this needs more discussion before its fully automated again. Note that looking for that one category is not adequate to remove incorrect pages. You should look for {{do not delete}} instead and you should remove pages that should never be deleted from the category, otherwise you'll just be wasting tons of time each run. My CAT:TEMP bot worked like:
- Remove pages from the category that aren't in user/user_talk namespace
- Remove IPs from the category, and recategorize indef-blocked ones into Category:Indefinitely blocked IP addresses
- Remove pages that contain {{do not delete}}
- Remove user talk pages where the userpage contains {{do not delete}}
- Remove the pages of users who aren't blocked for > 3 years or who aren't blocked at all
- Remove pages in various spam-related categories
- Delete any remaining page last edited more than 30 days ago.
- There was also a check to remove pages with >= 100 edits, but that was added later, and I'm not sure if I ever ran it after adding that. The source for my CAT:TEMP bot is here. Mr.Z-man 07:02, 14 June 2009 (UTC)[reply]
I think there needs to be more discussion on whether there is consensus for these pages to be deleted by a bot. iirc the last discussion about these deletions didn't turn out to well, people still seem pretty split on the delete/don't delete question. --Chris 12:39, 15 June 2009 (UTC)[reply]
- I can't think of any other way to delete over ten thousand pages. I will be incorporating some of the checks from CAT:TEMP bot but really don't see what the issue is with removing these useless pages. Nakon 21:27, 15 June 2009 (UTC)[reply]
- The problem is probably more whether people think it's a good idea to clean CAT:TEMP at all, not which way. Some further discussion is probably needed. Regards SoWhy 08:03, 16 June 2009 (UTC)[reply]
- I think it's a great idea, but perhaps you should start a discussion on the Village Pump or an RFC or something to gage community response. – Quadell (talk) 12:55, 16 June 2009 (UTC)[reply]
Is there discussion anywhere besides here about whether CAT:TEMP files should be deleted? – Quadell (talk) 13:05, 23 June 2009 (UTC)[reply]
- I left a note at VPP, but it did not draw much attention. The category itself states that the userpages in the category "only exist temporarily, usually to provide information to the users or allow them a suitable period of time to contest blocking.". The scenario that Fram brought up does not really apply to this request as I have no intention of removing pages of users that have been banned from the site. Nakon 07:51, 24 June 2009 (UTC)[reply]
- Oppose, there are too many templates that inappropriately dump pages into CAT:TEMP and there is no consensus on the deletion of such articles. Stifle (talk) 11:10, 24 June 2009 (UTC)[reply]
- So let's get the templates fixed rather than just ignore the problem. Nakon 14:28, 24 June 2009 (UTC)[reply]
Denied. No consensus, sorry. – Quadell (talk) 16:49, 2 July 2009 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.