Wikipedia:Bots/Requests for approval/Theo's Little Bot 22

The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

Approved.

Theo's Little Bot 22

Operator: Theopolisme (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 22:16, Sunday June 30, 2013 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: on github

Function overview: Automates inclusion of Rotten Tomatoes statistics in articles, using the new {{Rotten Tomatoes score}}

Links to relevant discussions (where appropriate): request and extended discussion on my talk

Edit period(s): Continuous

Estimated number of pages affected: Unknown

Exclusion compliant (Yes/No): No need, only edits subpages of template

Already has a bot flag (Yes/No): Yes

Function details: Extensive documentation can be found at {{Rotten Tomatoes score}}. Basically, the bot just uses the Rotten Tomatoes API to update subpages of Template:Rotten Tomatoes score, using IMDB ids as unique identifiers for films.

Discussion

Creating a massive amount of sub-pages for a template, which will only get worse as it becomes more and more popular? Sounds a bit like the status bots to me (although to be fair, at least this serves a useful purpose). The time taken to update will also increase dramatically as the use of the template increases. How many pages do you expect this will run on? --Chris 14:45, 2 July 2013 (UTC)[reply]

One improvement would be to store the Rotten Tomatoes ratings offline as well (some kind of database, or hell, even a text file will do). That way, when you're updating the scores, you can use the offline database instead of hitting Wikipedia for each subpage (you'll still have to hit Rotten Tomatoes for each one, but hey, they can afford it :P ) It will be much faster, and save the servers a bit of unnecessary load. A further optimisation would be to only check the rating after x amount of time (e.g. a week) --Chris 14:51, 2 July 2013 (UTC)[reply]

Good points, Chris. *ponders* I don't really know how many pages this would run on; there are a ton of movies on the 'pedia. However, movies' scores tend not to change after they've been out for a while, so in all likelihood the bot would really only have to update recent released films' scores. Your suggestion about an offline database is a really good idea -- this would most definitely eliminate load. I'll look into all of this in the next few days. If you or anyone else comes up with any other suggestions I'd love to hear them! By the way, I do resist the "status bots" connection...what about {{cite doi}}, then? Theopolisme (talk) 16:11, 2 July 2013 (UTC)[reply]

The status bots were a couple of bots long ago, back when there was a big fad about having "This user is online" templates on people's userpages. The bots would update the templates based on the time of the users last edit. But they were a waste of resource and the devs had them shut down. See User:StatusBot and here. Anyway, I think with the changes we've made this bot should be fine. --Chris 09:59, 9 July 2013 (UTC)[reply]

I've implemented part of the offline functionality. Right now, the bot will not edit a page more than once in a 5 day period. However, I haven't done the offline-database-for-scores yet. Theopolisme (talk) 03:22, 3 July 2013 (UTC)[reply]

...And now I've completed the offline score storage as well (I used MD5 hashing to keep filesize down). Chris, your thoughts? Theopolisme (talk) 22:10, 3 July 2013 (UTC)[reply]

As one of the editors that brought the idea for this bot to Theopolisme's attention, I would be happy to help in anyway I can to get this moving. Chris, do you have any further thoughts on this? I am hoping to see a trial run for this. Thanks, Technical 13 (talk) 17:30, 8 July 2013 (UTC)[reply]

Sorry about the delayed response. Unless I am mistaken the bot still retrieves the page contents for every single page? See line #133 where it loads the page content before doing any of the other checks. --Chris 09:59, 9 July 2013 (UTC)[reply]

The bot updates on a per-ID level, not a per page level. Line 133 is needed to get a list of articles that transclude {{Rotten Tomatoes score}}--the articles are then parsed by process_page() for all of the IDs that they contain. Otherwise, the bot would have no idea which IDs it would need to update. I'm averse to caching/offline storage on a per-page level, since the association is not page→score, but IMDB_id→score. Theopolisme (talk) 14:13, 9 July 2013 (UTC)[reply]

Yes, but then you still have the exact problem that we're trying to avoid -> requesting n number of pages every time the bot runs. Can't you store both the page, and the IMDB id so you don't have to request the page to get the id? --Chris 14:20, 9 July 2013 (UTC)[reply]

Okay, I've updated the script to fetch article contents only if the article includes new IMDB ids (as determined by a maintenance category). For other IDs, the script just works off its database. Theopolisme (talk) 14:54, 9 July 2013 (UTC)[reply]

Awesome!

Approved for trial (7 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. --Chris 14:57, 9 July 2013 (UTC)[reply]

Cool, thanks! :) Trials for these kinds of scripts are always a bit tricky, if only for the fact that they require addition of the template to article space. Are you okay with me/Technical 13 just adding {{Rotten Tomatoes score}} to articles en masse, or would you like me to solicit users from the Village Pump, or something else completely? Theopolisme (talk) 15:29, 9 July 2013 (UTC)[reply]

You can add them, but for this trial, lets just limit it to adding to 50 articles. Chris 02:34, 10 July 2013 (UTC)[reply]

Trial complete. Okay, it's been sixteen days and Technical 13 and I have added the template to some articles (see what links here). I'd like for either a, the task to be approved so I can publicize it, or b, permission to publicize the template so as to gain a larger number of samples for the trial. Thanks, Theopolisme (talk) 21:52, 25 July 2013 (UTC)[reply]

I'm basically happy to approve. My only concern is edits like this, which only update the access date. Do we need the access date, is this really necessary? What about if there's 50,000 articles using the template, do we really need to edit each of those templates solely to change the date? --Chris 11:51, 27 July 2013 (UTC)[reply]

Chris G, while I agree with you that updating just the access date every 5 days is too frequent, I would think it would be appropriate to update it once a month or even once every three months so that people that look at the article know that it is being looked at and updated on some sort of schedule. Theopolisme, what do you think about adding a feature that if access date is the only change it suppresses it to once (a month|every three months)? Technical 13 (talk) 13:25, 27 July 2013 (UTC)[reply]

Actually, I think the only reason the access date was changed was because the script was moved from my development machine to Labs (and I forgot to move the data files with it). But that shouldn't happen normally, see [1]. Theopolisme (talk) 16:19, 27 July 2013 (UTC)[reply]

Ok, that's good then. --Chris 03:08, 28 July 2013 (UTC)[reply]

Approved. --Chris 03:08, 28 July 2013 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.