Wikipedia:Bots/Requests for approval/TweetCiteBot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: TheSandDoctor (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 02:18, Thursday, October 26, 2017 (UTC)
Automatic, Supervised, or Manual: automatic
Programming language(s): PHP
Source code available: https://github.com/TheSandDoctor/TweetBot
Function overview: This bot converts tweet references from either bare links (so <ref>URL</ref>) or {{cite web}} to {{cite tweet}}.
Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Bare_Twitter_URL_bot
Edit period(s): Periodic
Estimated number of pages affected: Approximately 4600-5000 mainspace articles
Namespace(s): Mainspace only.
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): No
Function details: The bot goes through a predefined list of pages in the mainspace generated by AWB and database dumps that have tweet URLs within them in order to convert them to the appropriate template ({{cite tweet}}). The bot is intelligent enough to know the difference between bare URLs and {{cite web}}, responding accordingly in each case and can deal with combinations of the two. This bot is both Assert and Exclusion Compliant. The bot can be toggled on and off at any point once started here. Of course, while it reads false at the moment, changing it to "true" won't change anything as the bot is still pointed to my testing environment installation (and will only be "pointed" to this MediaWiki if approved).
Given the large number of pages that would be affected by this change, I would recommend/suggest that the account be given the 'bot' flag/group if approved as to avoid cluttering up watchlists and recent changes.
Discussion
edit- When the bot parses bare links, is it also parsing bracketed links as well? Being involved in IABot's development, I've encountered this so many times. Does the bot handle the little exceptions the MW makes with certain plain and bracketed URLs? I ask this since this isn't using AWB to make the edits.—CYBERPOWER (Trick or Treat) 02:27, 26 October 2017 (UTC)[reply]
- Hi Cyberpower678, do you mean links like [this_is_url link]? If so, it does not currently support that, however, that is something I will look into adding ASAP if you would like (I just think that Tweets are more likely to be refs than straight URL links in this case). It currently looks for either a bare tweet URL, so the form of twitter.com/account_name/status/numeric_string (either with or without www/https/http or any combination of those prefixes) as well as their first < ref> tag and swaps those out (leaving the ref tag alone of course) for {{cite tweet}} using the Twitter API to gather the necessary information. The be exact about that particular process, it takes the numeric string with is the ID of the tweet and then uses the API to pull in the account handle (so @username), account display name (for use on Wikipedia, that is probably their real name), text of tweet, and the date/time it is created (which then it uses the native PHP library/class for dealing with conversions, DateTime.)
- For a specific example: take this tweet on Mick Jagger's Twitter ([1]) and, for the sake of argument, assume it contains some vital information and is considered an adequate source (yadayada) that resulted in it being a suitable reference in the GA. The bot would pull down the following from the Twitter API and either replace the bare ref or {{cite web}} with the following:
- {{cite tweet |author=Mick Jagger |user=MickJagger |number=923241052252844033 |date=25 October 2017 |title=Last rehearsal of the tour! It’s been an amazing run #StonesNoFilter #StonesParis}}.
- In the case of {{cite web}} it would first check if there are any instances of {{cite web}} on the page, if not it would move on. If so, it would then check if the URL parameter (|url=) - regardless of where it is, so long as it is before the closing curly brackets and has a pipe in just before it - contains a Tweet, if so it would then go through basically the same process as above (analyzing link with Twitter API to pull relevant info out of it, then replacing it).
- Hopefully this provides some answers and I do apologize for the long length of the response. If you have any more questions or concerns, please do let me know. The source code is available on Github (link above), with all files, except for username.php, included, so feel free to check it out. The source code is fairly well documented and I will continue to work on it and update it throughout this process to address any concerns/suggestions raised. (In case you were wondering, username.php, which is referenced in most - if not all - of the files, only contains login information and Twitter API stuff that I will NOT be making public for obvious reasons.) (As a side note, I love the "trick or treat" themed sig ) --TheSandDoctor (talk) 03:05, 26 October 2017 (UTC)[reply]
- Cyberpower678 I have started work on improving the bot to recognize the use of bracketed Twitter links and have developed the regex to recognize them. With that said, I feel that that may potentially be outside the scope of the bot as it is set to convert to the {{cite tweet}} template, which bracketed links to Tweets are most likely not intended as (thinking of external links sections) and are probably not intended to be citations. I have updated the application for clarity in that the bot only edits mainspace articles. Also, please note that I have requested on my talk page that xaosflux change TweetBot's username as Jonesey95 pointed out to me that "TweetBot" could potentially be confused as being too close to WP:CORPNAME and I did not realize at the time of filing that it may be confused with a Mac app sharing the same name (ironically enough, I am writing this on a Mac and actually wrote the bot on one). --TheSandDoctor (talk) 20:18, 26 October 2017 (UTC)[reply]
- Please see Wikipedia:Changing_username for how to request a rename. — xaosflux Talk 20:22, 26 October 2017 (UTC)[reply]
- Cyberpower678 I have started work on improving the bot to recognize the use of bracketed Twitter links and have developed the regex to recognize them. With that said, I feel that that may potentially be outside the scope of the bot as it is set to convert to the {{cite tweet}} template, which bracketed links to Tweets are most likely not intended as (thinking of external links sections) and are probably not intended to be citations. I have updated the application for clarity in that the bot only edits mainspace articles. Also, please note that I have requested on my talk page that xaosflux change TweetBot's username as Jonesey95 pointed out to me that "TweetBot" could potentially be confused as being too close to WP:CORPNAME and I did not realize at the time of filing that it may be confused with a Mac app sharing the same name (ironically enough, I am writing this on a Mac and actually wrote the bot on one). --TheSandDoctor (talk) 20:18, 26 October 2017 (UTC)[reply]
- Hopefully this provides some answers and I do apologize for the long length of the response. If you have any more questions or concerns, please do let me know. The source code is available on Github (link above), with all files, except for username.php, included, so feel free to check it out. The source code is fairly well documented and I will continue to work on it and update it throughout this process to address any concerns/suggestions raised. (In case you were wondering, username.php, which is referenced in most - if not all - of the files, only contains login information and Twitter API stuff that I will NOT be making public for obvious reasons.) (As a side note, I love the "trick or treat" themed sig ) --TheSandDoctor (talk) 03:05, 26 October 2017 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete.—CYBERPOWER (Trick or Treat) 00:25, 28 October 2017 (UTC)[reply]
- Thank you Cyberpower678, I will start the trial as soon as possible. --TheSandDoctor (talk) 03:55, 28 October 2017 (UTC)[reply]
- So far so good overall, but I am pausing for the moment to work out one alarming bug discovered (this). By all accounts, it shouldn't have happened so going to go back to my testing environment on my own MediaWiki installation and test with an export of that page to see how to stop this from happening again. Will continue testing/trouble shooting ASAP (just don't have any more time to at this immediate moment). I am glad that I took this slow, only letting it do 1 or 2 edits at a time (and then manually reviewing before moving on), might have stopped it causing inadvertent vandalism. Will keep this thread posted. --TheSandDoctor (talk) 06:11, 28 October 2017 (UTC)[reply]
- Thank you Cyberpower678, I will start the trial as soon as possible. --TheSandDoctor (talk) 03:55, 28 October 2017 (UTC)[reply]
Trial complete. Hello, trial has been completed. After some initial errors (which I promptly corrected), errors that bot has made have reduced significantly (and pages it did error I re-ran it on and it worked correctly). Only issue left is that, in fixing a rare issue where the bot places {{dead link}} in the wrong spot (due to regex issue), it broke its ability to recognize the access date or access-date parameters. With that said, it is an issue I am looking to remedy as soon as possible, however, is not an overly serious issue as, while the bot does (at present) remove access date when converting, Unlike Facebook posts, tweets cannot be edited, so the versions seen at the time (access date) is the same version from when the tweet was first tweeted. In the event that the tweet was deleted (causing a 404 error), the bot does attempt to add an archive link (if it can't it tags appropriately) and does not convert (as it relies on Twitter API for Tweet information). --TheSandDoctor (talk) 05:43, 11 November 2017 (UTC)[reply]
- @TheSandDoctor: Speaking from experience when developing IABot, I can say for certain relying on regexes to handle templates, is not easy. It took forever to get the perfect balance of string parsing and regexes.—CYBERPOWER (Chat) 13:49, 16 November 2017 (UTC)[reply]
- @Cyberpower678: I agree with you, however, I do like the challenge that it presents and have been able to overcome any issues that arise fairly quickly. Based on the edits to date, it appears that I have ironed out the majority (if not all) of the issues, save for access date parameter. If you would be more comfortable with another trial, I am open/happy to doing that. I will fix the access date param as soon as possible (just busy with my studies as finals are fast approaching) --TheSandDoctor (talk) 23:52, 16 November 2017 (UTC)[reply]
- Approved.—CYBERPOWER (Merry Christmas) 17:59, 2 December 2017 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.