Wikipedia:Bots/Requests for approval/CensusBot 2
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Logan-Census (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 16:50, Monday, July 17, 2017 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): Python
Source code available: https://github.com/CommerceDataService/census-wikidata-bot/blob/master/wikipedia_bot.py
Function overview: A bot for checking total population and ranking values in U.S. State and County page infoboxes and editing to add official values from U.S. Census Bureau API's
Links to relevant discussions (where appropriate): Wikipedia_talk:WikiProject_U.S._counties#County_Infobox_Inconsistencies
Edit period(s): Twice yearly
Estimated number of pages affected: 3,000 pages (each US county)
Exclusion compliant (Yes/No): No
Already has a bot flag (Yes/No): No
Function details: This is an extension of the original CensusBot request, in order to allow the bot to edit population values for US county pages.
CensusBot is a bot produced by the Commerce Data Service in collaboration with the U.S. Census Bureau. The main purpose of this bot is to check Wikipedia pages to make sure that they contain the most up to date information available from the Census Bureau and that the entries are complete.
Discussion
editIn addition to this bot request, I want to register this user account as an additional operator for CensusBot. I am working with Sasan-CDS. Let me know if there is anything I need to edit for the bot or the pervious request to have me listed as an approved operator. -- Logan-Census (talk) 18:35, 26 July 2017 (UTC)[reply]
- Please have the original operator confirm this request.—CYBERPOWER (Around) 21:10, 27 July 2017 (UTC)[reply]
- @Cyberpower678: Hi, I can confirm that Logan-Census is now the operator of this bot. Sasan-CDS (talk) 13:52, 7 August 2017 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Please link to the diffs when done.—CYBERPOWER (Chat) 18:04, 7 August 2017 (UTC)[reply]
- I note that the bot removes extra spaces that are included as part of the infobox templates ([1]) which can make it easier for editors to work with infoboxes. I don't know the views of others on these, minor stylistic changes, but that edit seemed to boarder on WP:COSMETICBOT if it were not for the comma. I am concerned that this could easily become a case of CONTEXTBOT. I am further concerned by the lack of requisite Wikipedia experience needed to operate a bot in this case, as I rased in the first BRFA. In particular the BRFA was filed incorectly (details here). TheMagikCow (T) (C) 18:43, 10 August 2017 (UTC)[reply]
- @TheMagikCow: I also noticed that issue and ensured that I only updated the one page while working through any additional updates. I reverted the change you linked to, and you can see the updated diff now ([2]). Additionally I’ve updated the bot code so that it won’t apply the thousands separator (‘,’) in future population values. For the trial, I'm using the 50 edits as an opportunity to update one county page per state -- and I am working through it methodically to ensure each page's variance is accounted for. There are three different templates used, and they’re not used consistently across pages, which makes automated changes tricky. If you have recommendations for the bot behavior, I can incorporate that into the code. Also, yes the BRFA was initially submitted by Sasan-CDS with the wrong format. That was a mistake that I fixed at the same time that I listed myself as the bot operator. -- Logan-Census (talk) 15:03, 11 August 2017 (UTC)[reply]
- Hi Logan-Census! Its not so much the presence of the comma that I take issue with. My issue is edits that the bot would perform where the only change is adding the comma, as a source and the correct population is already present. Thanks for fixing the changes above - the task seems tricky with the different templates in use! Best, TheMagikCow (T) (C) 17:13, 11 August 2017 (UTC)[reply]
- @TheMagikCow: I also noticed that issue and ensured that I only updated the one page while working through any additional updates. I reverted the change you linked to, and you can see the updated diff now ([2]). Additionally I’ve updated the bot code so that it won’t apply the thousands separator (‘,’) in future population values. For the trial, I'm using the 50 edits as an opportunity to update one county page per state -- and I am working through it methodically to ensure each page's variance is accounted for. There are three different templates used, and they’re not used consistently across pages, which makes automated changes tricky. If you have recommendations for the bot behavior, I can incorporate that into the code. Also, yes the BRFA was initially submitted by Sasan-CDS with the wrong format. That was a mistake that I fixed at the same time that I listed myself as the bot operator. -- Logan-Census (talk) 15:03, 11 August 2017 (UTC)[reply]
- I note that the bot removes extra spaces that are included as part of the infobox templates ([1]) which can make it easier for editors to work with infoboxes. I don't know the views of others on these, minor stylistic changes, but that edit seemed to boarder on WP:COSMETICBOT if it were not for the comma. I am concerned that this could easily become a case of CONTEXTBOT. I am further concerned by the lack of requisite Wikipedia experience needed to operate a bot in this case, as I rased in the first BRFA. In particular the BRFA was filed incorectly (details here). TheMagikCow (T) (C) 18:43, 10 August 2017 (UTC)[reply]
@Cyberpower678: I have made 50 edits using CensusBot. Some states didn't have any pages that needed to be updated, so this represents changes to 36 different states. The 50 diffs are provided below:
- Anchorage, Alaska
- Kusilvak Census Area, Alaska
- Lafayette County, Arkansas
- Butte County, California
- Adams County, Colorado
- Baker County, Florida
- Appling County, Georgia
- Honolulu County, Hawaii
- Ada County, Idaho
- Adams County, Illinois
- Adams County, Indiana
- Adair County, Iowa
- Adair County, Kentucky
- Acadia Parish, Louisiana
- Allegany County, Maryland
- Barnstable County, Massachusetts
- Alcona County, Michigan
- Alcorn County, Mississippi
- Adair County, Missouri
- Beaverhead County, Montana
- Adams County, Nebraska
- Churchill County, Nevada
- Atlantic County, New Jersey
- Bernalillo County, New Mexico
- Albany County, New York
- Alamance County, North Carolina
- Adams County, Ohio
- Adair County, Oklahoma
- Adams County, Pennsylvania
- Bristol County, Rhode Island
- Abbeville County, South Carolina
- Anderson County, Tennessee
- Anderson County, Texas
- Beaver County, Utah
- Addison County, Vermont
- Accomack County, Virginia
- Barbour County, West Virginia
- Adams County, Wisconsin
- San Francisco
- Alamosa County, Colorado
- Arapahoe County, Colorado
- Archuleta County, Colorado
- Baca County, Colorado
- Bent County, Colorado
- Boulder County, Colorado
- Chaffee County, Colorado
- Cheyenne County, Colorado
- Clear Creek County, Colorado
- Conejos County, Colorado
- Costilla County, Colorado
As it is setup today, the bot will specifically attempt to update the Infobox_U.S._county and Infobox_settlement templates, only making a change if the raw population value doesn't match the most recent US Census population estimate. There are four U.S. counties still using the Geobox template, but I am ignoring those for now. The bot attempts to keep properties formatted with similar whitespace and order within the template, and most of that is done using the Python mwparserfromhell. -- Logan-Census (talk) 14:53, 15 August 2017 (UTC)[reply]
- So this goes back to the formatting, why is the bot removing the comma from certain numbers? I feel they should be left alone if in there. It should be easy to check for the presence of a comma.—CYBERPOWER (Chat) 13:10, 16 August 2017 (UTC)[reply]
- @Cyberpower678: The bot is making that change because that's the directive within the template instructions. Those are defined here and here. It says to use the 'raw' format for numbers, and specifically mentions it when discussing population. I imagine it's a better format for localization. -- Logan-Census (talk) 16:53, 16 August 2017 (UTC)[reply]
- Approved.—CYBERPOWER (Chat) 18:56, 16 August 2017 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.