Wikipedia:Bots/Requests for approval/ContentCreationBOT

The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

Request Expired.

ContentCreationBOT

Operator: ThaddeusB

Automatic or Manually assisted: automatic, unsupervised

Programming language(s): Perl

Source code available: here

Function overview: fill in tables with data on prehistoric creatures

Edit period(s): one time run

Estimated number of pages affected: 13

Exclusion compliant (Y/N): N/A - this only applies to user & talk pages, correct?

Already has a bot flag (Y/N): N

Function details: Using a large database of information downloaded from http://paleodb.org and http://strata.geology.wisc.edu/jack/ the first function of this bot will be to fill in the tables found in various "list of" articles. A sample entry has been filled in here. Any data that is missing from the database will simply to left blank.

Only a tiny number of pages will be affected, but the amount of bot filled in content will be immense. As such, I am suggesting the bot trial be something along the lines of "the first 10 entries on each page" rather than a number of edits.

A copy of the database is available here (425k). The database is organized as follows:
Genus--Valid?--Naming Scientist--Year named--Time period it lived during--Approx dates lived--locations

A "1" in the valid column means it is currently listed as a valid genus, "NoData" means it couldn't be determined - most likely because there are two genus with the same name, and "No-{explanation}" means it is not currently listed as a valid genus.
Data proceed by a "*" means it was derived from Sepkoski's data, using the dates found here (compiled by User:Abyssal). All other data came from paleodb, using their fossil collection data for more precise dates (when available.)
Spot checking of my data is encouraged, although I'm confident no novel errors have been introduced. If anyone knows of additional sources to derive similar data, let me know and I'll incorporate those sources into the database.

List of pages to to affected: (might be expanded slightly if others are found)

Discussion

Long discussion

Source code will be published shortly, although the code itself is a quite simple "read from db publish to Wikipedia" operation. --ThaddeusB (talk) 01:48, 3 September 2009 (UTC)[reply]

Now available here. --ThaddeusB (talk) 13:16, 8 September 2009 (UTC)[reply]

I have ~~spammed~~ asked the relevant projects for input: [1] --ThaddeusB (talk) 02:08, 3 September 2009 (UTC)[reply]

Wikipedia talk:WikiProject Gastropods#BOT notice.
List of genera of Monoplacophora would be fine. (Uncertain Monoplacophora/Gastropoda should be on separate list. See Monoplacophora#Fossil species.) --Snek01 (talk) 09:29, 3 September 2009 (UTC)[reply]

Bullets?

Maybe the countries should be a bulleted list to save horizontal space. IE:

Poland

Switzerland

as opposed to "Switzerland, Poland". Abyssal (talk) 15:30, 3 September 2009 (UTC)[reply]

Which is preferable: Option 1

Genus	Authors	Year	Status	Age	Location(s)	Notes
Advenaster	Hess	1955	Valid	Late Bajocian to Late Callovian	Switzerland, Poland
SampleEntry	Hess	1955	Valid	Early Cretaceous to present	Switzerland, Poland, United States, France
Bad Genus			Invalid			rank changed to subgenus, name to Genus (Sub genus)

Option 2

Genus	Authors	Year	Status	Age	Location(s)	Notes
Advenaster	Hess	1955	Valid	Late Bajocian to Late Callovian	Switzerland Poland
SampleEntry	Hess	1955	Valid	Early Cretaceous to present	Switzerland Poland United States France
Bad Genus			Invalid			rank changed to subgenus, name to Genus (Sub genus)

or Option 3

Genus	Authors	Year	Status	Age	Location(s)	Notes
Advenaster	Hess	1955	Valid	Late Bajocian to Late Callovian	Switzerland Poland
SampleEntry	Hess	1955	Valid	Early Cretaceous to present	Switzerland Poland United States France
Bad Genus			Invalid			rank changed to subgenus, name to Genus (Sub genus)

Any is fine by me. --ThaddeusB (talk) 18:49, 3 September 2009 (UTC)[reply]

Definitely 2 or 3, although I don't care which. I love that you're using the sort template. <3 Also, could you have the year link to the "year in paleontology" article? And link to the countries? Abyssal (talk) 04:35, 4 September 2009 (UTC)[reply]

No problem, I will link the date & countries. --ThaddeusB (talk) 13:45, 4 September 2009 (UTC)[reply]

Hmm, WP:Context and all that - re countries. Surprise links to "year in" are not that great either. Rich Farmbrough, 09:51, 6 September 2009 (UTC).[reply]

Invalid genera should be in separate table, because it is better for general public. If there is any reason to have them together, then it could be also OK. Such tables will be also useful. --Snek01 (talk) 11:48, 4 September 2009 (UTC)[reply]

I'm neutral on linking the countries, although I will point out that such links would allow someone to easily figure out where in the world the fossil was found. I view the "year in" links as completely appropriate as naming of new genus is something that is/should be covered in those articles. --ThaddeusB (talk) 13:16, 8 September 2009 (UTC)[reply]

The bot is only going by the entries that already exist in the tables & it looks like there are only about 5 invalid entries across the dozen or so pages. As such, I've leave it up to regular editors to pull the entries out of the main table rather than trying to write a function to do it.

Improper age sorting (fixed)

A sample update has been performed here. --ThaddeusB (talk) 13:16, 8 September 2009 (UTC)[reply]

I haven't had time to look closely at it, but I notice the age sorting isn't working right, but I can't figure out why. Any idea? Abyssal (talk) 18:31, 8 September 2009 (UTC)[reply]

Hmm, could you be more specific as it seems to work for me. The first click put it from most recent to oldest and the second from oldest to most recent. --ThaddeusB (talk) 00:57, 9 September 2009 (UTC)[reply]

The first 20% or so is alright, but after the Miocene (when viewed in ascending order) it starts listing Jurassic stages, then it goes back to Pleistocene, and for some reason Cretaceous ages are listed as if they were the oldest. It's looked this way both at home and at school. The browser I used is Firefox. Abyssal (talk) 19:09, 9 September 2009 (UTC)[reply]

OK, I figured out the problem. Apparently the {{sort}} template doesn't work properly with numbers, so everything is being sorted "alphabetically" - that is 1 < 10 < 2 etc. It was only coincidence that the first 20% is still correct (and I didn't look down far enough to realize the error). I'll add a fix for this tonight & re-upload the sample page. --ThaddeusB (talk) 20:05, 9 September 2009 (UTC)[reply]

Good sleuthing! Abyssal (talk) 22:28, 9 September 2009 (UTC)[reply]

I believe the latest upload fixes the issues. --ThaddeusB (talk) 21:19, 10 September 2009 (UTC)[reply]

How did you fix the problem (I'll need to use the same method for other articles). Abyssal (talk) 02:28, 11 September 2009 (UTC)[reply]

By adding enough zeros in front of the numbers to make them correctly sort as strings. That is, by converting "1.23" to "001.23", "45" to "045", etc. --ThaddeusB (talk) 20:41, 12 September 2009 (UTC)[reply]

Need for outside input

Just a thought, but in light of the Anybot debacle - it might be a good idea to put a call out to the WikiProjects and recruit some marine biologists/fossil guys/crustacean guys/etc. to come take a look at your trial edits and check them over with a fine tooth comb before the bot is given final approval. If Anybot taught us anything, it's how simple errors in interpreting database content can lead to masses of incorrect information going live to the 'pedia and remaining there for months, unnoticed. --Kurt Shaped Box (talk) 22:28, 10 September 2009 (UTC)[reply]

I certainly want several people to look over the data and have notified the 6 most relevant WikiProjects. So far, those notices don't seem to have attracted many people. :( --ThaddeusB (talk) 22:45, 10 September 2009 (UTC)[reply]

Can you try asking specific editors by the type of articles you intend to produce? Asking editors if they will check the data? --69.225.12.99 (talk) 02:34, 11 September 2009 (UTC)[reply]

13 pages? Why do we need to approve a bot for that? Run your program. Dump the output to a screen. Post it by hand. Preview. Save. A bot will save you a few minutes whilst vastly increasing the risk to the project. Hesperian 23:55, 10 September 2009 (UTC)[reply]

First off, let's not be ridiculous - me running a program locally & manually uploading the data is no less of a risk than me running a program locally & it automatically uploading the data. Now, there are several reasons I am requesting approval rather than just uploading the data:

Yes it will only edit a few pages, but the amount of data it will import is immense, as we are talking about the automated filling of several thousand table entries. The amount of info that is being auto generated and added deserves community consent, IMO.
I want as many people as possible to look it over to make sure the bot isn't adding inaccurate info. If I just uploaded it all in my name, it wouldn't get the same scrutiny
There is a planned second part of this task (automated creation of stubs) that will edit thousands of articles. This will be a separate BRFA, but the idea here is to get any bugs/inaccurate input data fixed on a relatively non-controversial task before moving onto a possibly controversial one. If the bot can handle accurate adding content to existing articles, then there is concrete evidence that it should be able to create stubs with prose based on the same information.

I hope that clear it up. --ThaddeusB (talk) 00:20, 11 September 2009 (UTC)[reply]

I obviously agree with Thad. This sort of tedious point-by-point extraction of information from a database is what bots are Wikipedia bots are made for. As someone who has filled similar tables out manually, I can vouch that using a bot for this purpose is the most effective way to accomplish it. Abyssal (talk) 02:18, 11 September 2009 (UTC)[reply]

I agree with that, Abyssal. I run scripts like that myself. But you don't need a bot account to run a script against an external data source. You only need a bot account to post the formatted results to Wikipedia. Personally I prefer to run my scripts, examine the results, tweak the scripts and run them again if necessary, iterate, eventually load the results into an edit window, preview, tweak, and finally save. It is a lesser risk to do it this way. The risk is only the same if you are going to copy-paste-save without examining the results, just like a bot would do. And in that case, I cannot comprehend why a bot is necessary. Once you've generated the data, posting 13 pages by hand will take you 6½ minutes. Thaddeus, I sure hope you'll be spending more time than that implementing and testing your bot. So where is the benefit? I also dispute the scrutiny argument. People don't scrutinise bots more; they largely ignore them until they screw up royally. And no, you don't need to obtain community consent before you edit Wikipedia, even if you are posting big pages. Hesperian 05:42, 16 September 2009 (UTC)[reply]

I already have examine the input data, tested the bot, and reviewed its results fairly extensively. I've probably put around 40 hours into it in fact. I am well aware that I didn't need approval to do this task. I merely feel it is better to do it with approval than without. This vetting process has already led to some subtle improvements that likely would have never happened if I'd only reviewed the output on my own. --ThaddeusB (talk) 13:09, 16 September 2009 (UTC)[reply]

Discussion regarding some objections

In my opinion, I don't think this bot should go forward without proactive community support for the bot. This means more than no one disapproves or shows negative interest. It requires editors from relevant projects get on board for vetting uploaded data. Without a group of editors to check data, it is my opinion the potential for another AnyBot type mess exists. Yes, this is the type of work that bots should be used for, in my opinion. But it requires a human editorial community to accompany the creation of articles. I'm also not thrilled with the sideways answers to some of my questions about this bot, before the RFBA. A single straight-forward answer to a question, when first asked, is more in keeping with the kind of communication that should be done when running bots that create articles, in my opinion. --69.225.12.99 (talk) 02:33, 11 September 2009 (UTC)[reply]

To be fair, no one asked me a single question. An IP (possibly you) asked Abyssal some questions, but they seemed to be directed towards his editing activities & not this bot. Additionally, he obviously didn't know the exactly details of how the bot would operate since he wasn't programming it... and frankly I didn't know the exact details either since it wasn't complete yet. I have explained what the bot plans to do in this BRFA (which is the correct place to do so), released the source code, and released the database. I personally have manually checked dozens of entries and I'm pretty sure Abyssal has as well. If you are offering your help in spot checking, then please check how many ever entries you want from the database, against whatever data source you can. If you find any problems, by all means please tell me so I can make adjustments to the database.

Beyond that, there really isn't anything I can do. I can't force people to spot check the data. Nor can I, or anyone else, check more than a trivial % of the total. I have to rely on the accuracy of the data I obtained from reliable sources. I can't verify every piece of data by hand, but only confirm the general integrity of the sources from which the data came. --ThaddeusB (talk) 03:06, 11 September 2009 (UTC)[reply]

And BTW, the bot isn't going to be creating any articles at this time. --ThaddeusB (talk) 03:08, 11 September 2009 (UTC)[reply]

If the bot is going to be creating tables of data on wikipedia and there is not a single editor outside of the creators interested in the data, there appears to be no desire for the bot on wikipedia. If aren't willing to check the data and are not interested enough to comment on the bot, who wants the bot?

According to bot policy, to gain approval, a bot must :

be harmless
be useful
not consume resources unnecessarily
perform only tasks for which there is consensus
carefully adhere to relevant policies and guidelines
use informative messages, appropriately worded, in any edit summaries or messages left for users

Consensus involves discussion and other editors. No other editors = no discussion = no consensus for the task to be done, much less by a bot. --69.225.12.99 (talk) 06:57, 11 September 2009 (UTC)[reply]

You are mistaken about the way Wikipedia works. Discussion is not needed to take action, discussion is only needed if the action taken if met with objection. Being bold is a core principle. If we had to discuss every action first, very little would actually get done. The task clearly falls under policy so there is already consensus to do the task. This discussion is to establish that my bot can do the task accurately and efficiently.

Second, you mistake low input for lack of interest. Just because few people have commented here, doesn't mean no one is interested in the data. We are talking about scientific data on prehistoric creatures here, not Britney Spears. The audience interested in this material is limited, of course, but nearly everyone would agree that this sort of information is at least as important to have on Wikipedia as pop culture, even though far fewer people are interested in it.

Third, you misunderstand what this request is for. The request is to fill in the existing tables, not create new ones. Technically (as pointed out by Hesperian) I do not even need BOT approval to do the task. The request was created in part to solicit additional eyes to make sure the data is accurate. Again, the information is 100% from reliable sources and a bot will copy the information far more accurately than a human ever could. So, I ask again are you willing to look over the data? Or are you just trying to block the task from happening? --ThaddeusB (talk) 15:28, 11 September 2009 (UTC)[reply]

No, I'm not mistaken about how wikipedia works, and I'm not mistaken about your insulting me rather than addressing the issue. The use of bots does not function by the, be bold, run a bot, make 10,000 entries, then decide if the community wants it theory. Please read the bots policy in its entirety before requesting approval for a bot. It is your responsibility as a bot operator to adhere to bot policy. You can't do that if you don't know it.

I think it's time to close this bot request for approval until there is community consensus for this task to be done. The lack of editors monitoring AnyBot was problematic enough, but that bot at least had some community consensus. This bot has absolutely none, and its operator denies there is any need for community consensus for its task. This is a bad start. Couple with the bot operator's combative nature and inability and/or unwillingness to address issues, I can't see anything but disaster and another mess of 1000s of entries for someone else to clean up. --69.225.12.99 (talk) 16:32, 11 September 2009 (UTC)[reply]

As the creator, designer, and near sole contributor to Wikipedia's lists of prehistoric invertebrates, and also the user who "proactively" sought out someone capable of programming a bot to perform the task at hand, I am curious as to who you expect us to seek consensus from. Should I make sock puppets and then ask them if they approve? I created every single one of the lists ContentCreationBOT will be contributing to, and somewhere around 23-26 of the 28 lists of prehistoric invertebrates in total on Wikipedia. Yes, I gave a range of pages there, as in "I created so many of them that I've lost count." The community-of-people-who-contribute-to-Wikipedia's-lists-of-prehistoric-invertebrates consists nearly entirely of myself, and there is strong consensus between myself and I that this task should move ahead. It's also nice of you to dishonestly claim that we're creating stubs here. We've worked diligently for months preparing this bot and you're willing to shut us down without even having read the description of the task we're requesting approval for. And the when we don't just hump up and take it, you throw a hissy fit and demand that the discussion be closed. Wow. Abyssal (talk) 17:20, 11 September 2009 (UTC)[reply]

I didn't use the word stub, until now. I think that this type of personal attack of people ("you throw a hissy fit") who have questions and concerns about the bot, once more, bodes poorly for the use of this bot to create any type of content. --69.225.12.99 (talk) 03:39, 12 September 2009 (UTC)[reply]

I took "create 10,000 entries" as you implying the bot would engage in stub generation, if that's not what you meant, sorry. Assuming you were referring to the data-adding task the bot isn't "creating" the enries, it's filling in blank entries in tables that already exist. Abyssal (talk) 18:15, 12 September 2009 (UTC)[reply]

1) I didn't insult you, and I am sorry you took it that way. I merely stated that you don't appear to understand what consensus means (at least as it applies to bot tasks).

2) "Perform only tasks for which there is consensus" means perform a task for which there is generally consensus. It doesn't mean we need 20 editors to come comment on every bot and say "yep, there is consensus for this task." There is already implicit consensus for adding this sort of information to articles as it has been done hundreds of time by many different editors with no objections. The bot can just do it faster and more accurately.

3) I have 3 approved bots and understand bot policy thoroughly.

4) You are arguing over semantics, not substance. I most certainly didn't claim the bot doesn't need consensus.

5) I can't "address the issue" as you have not outlined any actual issue with either the bot or the RS data it is using. You have only stated your personally opinion that you think more people should look at the data.

6) Do you have any actual policy based objection to the task of filling in existing tables with reliable source data? Or any objections to the data/code to put it on Wikipedia?

7) If the bot is rejected I will just manually upload the exact same data - as is my right as an editor - and the community will loose the benefit of explicitly knowing it was extracted from a database by a bot. Again, I didn't even have to ask for approval of this task as there is no actual need to automate the uploading. I did so for the community's benefit. --ThaddeusB (talk) 16:56, 11 September 2009 (UTC)[reply]

I would also like to add that the data has been looked over by accredited scientists or it wouldn't have been on Paleodb.org to begin with. To expect me or anyone else to manually check every entry is completely unreasonable and, in my opinion, would be more likely to introduce novel errors than improve overall accuracy even if it was possible. --ThaddeusB (talk) 17:02, 11 September 2009 (UTC)[reply]

I stand by my original "hissy fit." --69.225.12.99 (talk) 03:39, 12 September 2009 (UTC)[reply]

Does that mean you also stand by your refusal to provide any concrete objections? --ThaddeusB (talk) 17:38, 12 September 2009 (UTC)[reply]

Question for ThaddeusB and Abyssal

As a matter of interest - and this may help to assuage "the IP algae guy"'s (as I think of him, based on our previous work together cleaning up after Anybot and in lieu of a better name) doubts and concerns, how familiar are you guys with the taxonomy of prehistoric invertebrates in a non-WP context? Is this your chosen field, area of scholarly interest or hobby? IOW - do you *know* these ex-critters, or are you strictly data-processing here?

The reason I ask is so that it may be established how likely it would be that a subtle misunderstanding/misinterpretation of the data presented at Paleodb (perhaps due to incorrect assumptions being made from incomplete knowledge) could occur, go unnoticed during the transfer because no-one knows what to look for - and thus result in massive factual errors being introduced to the wiki. Going back to Anybot, one of the reasons that it failed so hard was the that BotOp didn't really 'know' algae to any great extent - but did he earnestly believe that he was capable of extracting the data automatically and formatting it into encyclopedia articles. That's all well and good, if it works - but well as misunderstanding some fundamental algae-related terms, he incorrectly assumed (as far as I am aware) that 'number of taxa/species listed at AlgaeBase = number of taxa/species known to science' and ran with it. Then, as there was no-one else around at the time who knew better (or perhaps because no-one else even looked at the resultant articles once they'd gone live), the assumption was made that the bot's output was correct. IIRC, the systematic errors were only uncovered when "IP algae guy"'s students started handing in coursework containing WP-sourced nonsense.

This scenario may not be exactly applicable to the Paleodb dataset - but before this goes any further, I would like to gauge the likelihood of the same thought process being applied again and leading to a different, but equally-borked end result. --Kurt Shaped Box (talk) 09:09, 12 September 2009 (UTC)[reply]

I believe the main problem with Anybot was programming, not lack of knowledge - although the second obviously contributed. The programmer made several fundamental errors like not resetting variables & making it runnable from a remote location without a password. These problems were not caught because 1) the code wasn't published and 2) no one who knew what they were doing looked over the sample stubs from the trial run. My code has been published, as has the data, as has a sample page. The code is not runnable remotely.

I do not personally have any knowledge of the subject. I was solicited as a capable bot op, with (what I believe to be) a reputation for carefully checking my bots' output & correcting errors. Abyssal is the one with the knowledge of the subject and the idea for the bot. He was doing the task manually for some time, but some users contacted him to say they thought a bot would do the task faster and more accurately, which is true. A human copy and pasting data will make an occasional error despite their understanding of the subject that a bot wouldn't make.

The reason I have been asking User:Abyssal about the bot is because there are problems with the fish stubs he/she created en masse. I am still waiting for him/her to respond to a question about the fish stubs I put on the user's talk page on May 30th.

While Abyssal claims to be the only working invertebrate paleontologist on wikipedia that is incorrect. I edit invertebrate paleo articles as do a number of my colleagues. Ultimately I will be more concerned about the stubs, but, thank you Kurt Shaped Box for reminding me how I met Abyssal: correcting problems with stubs. Oh, by the way, I'm not really algae guy, as I've said before, I'm marine invertebrate paleo guy. --69.225.12.99 (talk) 09:29, 12 September 2009 (UTC)[reply]

You are putting words in Abyssal's mouth. He didn't claim to be the "only working invertebrate paleontologist." He claimed to be the only one working on the specific articles for which this bot will provide data. Additionally, the error you found is precisely why this task should be done by a bot. No human is going to be able to copy gobs of data without introducing novel errors. A well made bot, won't introduce novel errors, although obviously it won't correct any errors in the original data either. (However, by Wikipedia policy we really should be goign by what the source says anyway, not using our own knowledge.) If you have any source to cross-check the paleodb data against, I'd love to hear it. Otherwise, I say that this is the best available data and that there is absolutely nothing wrong with reproducing it.

Again, why are you trying to block this task. You say you have knowledge with the subject, yet you refuse to offer you help looking over the data. You demand I find people willing to do this, yet you yourself are a prime candidate to help & refuse. Why? --ThaddeusB (talk) 17:38, 12 September 2009 (UTC)[reply]

If anyone has issues with the stubs I made before, all they have to do is ask. If they've asked and I've forgotten to respond, all they have to do is ask again and remind me. That's much nicer than asking, then waiting for months before bringing it up as an act of passive aggression. Also, the substubs I created are not only irrelevant to the discussion on face, but they also bear little resemblence to the fuller, more complete stubs that ContentCreationBOT may create in the future and serve as a poor analogy for such. Abyssal (talk) 18:30, 12 September 2009 (UTC)[reply]

Okay - 'Marine Invertebrate Paleo Guy' is is, then... :) By the way, is this the user talkpage post you're talking about? If so, Abyssal did reply to you. --Kurt Shaped Box (talk) 09:41, 12 September 2009 (UTC)[reply]

No, in fact he/she didn't. The last question I posed has been ignored since it was posted--reread the post about the two articles with similar names. He/she only responded to the first part, agreeing I had corrected his data for the one article, and thanking me for doing so, but not for the question of whether both articles for what appear to be a single organism should be on wikipedia. This last is precisely the type of mistake that needs reviewed and corrected by humans with content creation bots. This bots owner and assistant have resorted to bullying. Bullying by the bot operators coupled with failure to act = giant mess on wikipedia that someone else has to clean up. It took months and probably a dozen wikipedia editors to clean up the AnyBot mess. In my opinion it's time to put an end to this RfBA as a place for User:ThaddeusB to post his personal attacks, since he doesn't have what is necessary for running a bot of this nature, and is focused on attacking me rather than getting the bot together. --69.225.12.99 (talk) 18:17, 12 September 2009 (UTC)[reply]

The only one attempting to bully people here is you with your whole I don't agree with this so let's shut down discussion right now attitude. You have repeatedly demanded this not take place, but still have yet to offer a single constructive comment. You say I am focused on "attacking you rather than getting the bot together." Um, the bot is together. There is nothing to "get together." Again, you have yet to offer a single actionable complaint with the actual bot that I can address.

You claim to be interested in making sure the bot doesn't make any errors, yet you refuse to help. I think it is pretty clear that your objection is either philosophical against this sort of task ever being done, or is motivated by personal dislike for me and/or Abyssal.

I have not made a single personal attack against you. I merely comments on your comments, just as you have commented on mine. Somehow it is perfectly acceptable for you to distort others comments and say whatever crap you want about them, but if they dare mention you in a reply they are personally attacking you?

Finally "this is the kind of mistake that needs to be reviewed by humans" is an irrelevant comment because this mistake was made by a human, not a bot. In fact, this example is proof why the task should be done by a bot - humans will always make some mistakes when copying large amounts of data. --ThaddeusB (talk) 19:05, 12 September 2009 (UTC)[reply]

I'm male, no need to use the double pronoun thing. As for bots adding errors, the articles won't be set in stone after creation, they will be subjected to the same scrutiny and incremental revisions and fact-checking that all other Wikipedia articles are. It's almost certain that bot generated content will introduce some errors, however, our human editors do substantial amounts of that as well. If a human added 99% good information and 1% inaccurate information, we would think of them as doing a good job. It's illogical to demand more from an automated contributor than from a flesh-and-blood one, but you seem to be expressing that double standard anyway. "Failure to act"? We've already run succesful demonstrations and made the full code public! What would it take to please you? As for us being bullies, well, an old proverb comes to mind. Abyssal (talk) 18:51, 12 September 2009 (UTC)[reply]

I understand that all bots can hiccup and make mistakes from time to time. Unless they start crapflooding, blanking or overwriting a huge number of articles, it's a pretty matter to put right. I'm more concerned about systematic errors that could result in non-apparent-except-to-experts factual inaccuracies across the majority of the bot-generated content. How confident are you with the subject matter at hand and the interpretation of the database content that this may be avoided - or if it did occur, that you'd be able to spot it quickly? I don't want it to come across as though I'm picking on you and ThaddeusB here - but Anybot has left me wary of bots that autogenerate content in this manner, wary enough at least to be thorough in asking questions. --Kurt Shaped Box (talk) 20:17, 12 September 2009 (UTC)[reply]

I think that thorough testing and a bit of preliminary fact-checking will demonstrate whether or not ContentCreationBOT can succesfully utilize the database to generate new content for Wikpedia. If it does prove successful in drawing from the database, then any errors will be on the database's side and thus out of our control. However, since the database was compiled and is operated by scientists, I have confidence that there will be no major problems. At this point it's really just a matter of testing. Abyssal (talk) 15:22, 14 September 2009 (UTC)[reply]

One of the reasons the anybot mess wound up so spectacularly bad was poor communication on the part of the bot operator and unwillingness to respond to concerns. These two users see expressions of concerns about their bot as an opportunity to attack someone for expressing concerns. This will make communication hard to impossible. Poor communication means it won't matter how a mistake is made, because the response will be to attack those who raise issues. And keep attacking and attacking them. Then come back and attack them some more. In my opinion, it simply doesn't matter, once an attitude of this manner has taken hold of the bot operator, there will be no means for issues of concern about the bot to be raised, no means for problems with the data to be pointed out. All such actions will get is an attack. And another attack, and an attack from a different angle, and a new attack. --69.225.12.99 (talk) 06:43, 13 September 2009 (UTC)[reply]

For the dozenth time, do you have any actual objection to express or are you just trying to block the bot? Also, for the dozenth time I can't address "your concerns" until you actually express something concrete. And no that isn't a personal attack despite what you seem to think. --ThaddeusB (talk) 13:06, 13 September 2009 (UTC)[reply]

Oh yeah, I see what you mean. Abyssal - could you check to see whether Graphiuricthys and Graphiurichthys (with an extra 'h') are supposed to be two separate articles? --Kurt Shaped Box (talk) 20:23, 12 September 2009 (UTC)[reply]

Thay're the same animal, so far as I can tell, but both have been used in the technical literature. I'm not sure which one is correct. Abyssal (talk) 15:16, 14 September 2009 (UTC)[reply]

What to do, then? Pick the most commonly used name and redirect the other article to it (Google Scholar would suggest that 'Graphiurichthys' is the way to go)? I know that these two were human-created articles - but this is exactly the sort of thing that the bot must not be permitted to do, if it starts creating stubs. --Kurt Shaped Box (talk) 19:47, 14 September 2009 (UTC)[reply]

Sepkoski spelled it wrong. It seems I added both the correct and incorrect spellings, forgot about it, and accidentally created an article for both. The problem seems to be pure human error on my part, and therefore unlikely to be duplicated by a bot. Abyssal (talk) 13:03, 15 September 2009 (UTC)[reply]

The problem with anybot was not necessarily the database. The data at algaeBase are fine, and the means of gathering data are identified. Part of what led to a huge mess, that caused the deletion of over 4000 articles and a couple of thousand redirects, was the lack of understanding of the data by the human coder and no community involvement in checking and verifying the articles.

Add to this an operator who would not deal with the problem articles as they were pointed out and you get a couple dozen other editors having to sort out the mess and delete the content.

In spite of the accusation of my being "passive-aggressive" two problem articles generated by Abyssal have been on wikipedia for a long time. He's willing to throw accusations at me, but still hasn't risen to the occasion of correcting the error. If he's going to leave articles that need deleted or corrected up, and these are just two, maybe he's expecting that someone else will clean up after the bot.

These are just two articles with little information in them, and one article is wrong and needs to be either a redirect or deleted. If this bot contributes 10,000 data items, who's going to check for accuracy? If there is any inaccuracy who's going to clean it up?

It seems ThaddeusB is going to blame me for not cleaning up the articles he wants to create-no, I'll do my own volunteer work on wikipedia, not yours, ThaddeusB. Let me know when you're going to start creating the articles I want. And Abyssal is going to throw accusations at the reporters of errors, but not going to correct errors.

Anybot generated errors due to human mistakes. Bots are subject to human errors. An unwillingness to address or correct errors is not an indicator for responsible bot running. --69.225.3.119 (talk) 05:18, 17 September 2009 (UTC)[reply]

Yet again do you have any actual objection I can address? Anybot's code was never checked & it seems was riddled with errors. I am sorry that happened, but its operator's problems are not a reflection on me. This bot's code has been checked & verified that it will copy the data exactly as planned. Further, there is no special knowledge required to copy, for example, the naming scientist from a database to a table.

I most certainly will listen to complaints if you have any that I can actually address. So far your complaints consist of 1) I won't manually check every entry and 2) I allegedly won't respond to complaints. The first is an unreasonable demand that would defeat the point of the bot. The second is merely speculation on your part, and runs contrary to my actual history on Wikipedia.

Then there is your constant stretching the truth\drawing unreasonable conclusions. E.g., "two problem articles generated by Abyssal" somehow equates to this bot screwing up massively. Wow, a human that made two errors in over 2000 articles. (OK, he probably actually made a few more, but clearly the error rate is very low.) One of which was copying Sepkoski's non-standard spelling, which is hardly a serious error. That is hardly reason to shut down this bot. And again, it is completely disingenuous to compare stub creation to filling in a table - the two are hardly the same thing.

Finally, I am not asking you to check 10,000 items I am just asking you to be reasonable and not expect me to check every item either. I have checked about 100 items and found no errors. That is a reasonable spot check. Others have checked some items as well. If you are unwilling to check even one item, then you have no right to complain that others haven't checked enough. --ThaddeusB (talk) 12:23, 17 September 2009 (UTC)[reply]

I'd feel a little less concerned about all this if Abyssal had fixed those duplicate articles already. It's been a few days now since they were pointed out to him. Now, if the bot buggers up the current task, fixing it will be a simple matter of a few one-click reverts. However, if something goes wrong if/when the bot starts creating stubs and we end up with another Anybot-type mess, I'd hope that A. would be much more enthusiastic in trying to put it right (being the guy with the subject knowledge) than he seems to be WRT the above. --Kurt Shaped Box (talk) 23:17, 17 September 2009 (UTC)[reply]

I went ahead and redirect it myself. Rest assured that if the stub creation (which obviously isn't being approved in this task) were to go awry I wouldn't hesitate to "delete all" first and then go and find the problem before starting over from scratch. --ThaddeusB (talk) 01:59, 18 September 2009 (UTC)[reply]

My impression from following this discussion (and participating slightly) is that the bot operator is reasonably careful and conservative, and appreciates the concerns being raised in the aftermath of the AnyBot debacle. There is no reason to tar Thaddeus with that brush. So long as Thaddeus continues to bear in mind the concerns raised here, and works slowly, and works closely with Abyssal or someone else who has a solid grasp of the field, and is willing to put the brakes if and when problems and issues are raised by others, then I am not opposed to this going ahead. Hesperian 23:52, 17 September 2009 (UTC)[reply]

I will certainly be cautious with this. I view myself as directly responsible for every edit my bots make & always proceed with caution. I always comb my bots' contributions and try to stamp out even the tiniest errors before releasing them on a larger scale. I assure everyone reading this that I most certainly will take any and all complaints about data integrity seriously.

Furthermore, I am well aware the reputation of bots that produce content has been severely tarnished by Anybot. This is part of the reason I brought this minor task here to begin with. Sure, I could have just uploaded the tables manually and no one would have ever questioned it. However, I want accurate data and I want to start re-building the community's trust that bots can build content. Thus I came here. --ThaddeusB (talk) 01:52, 18 September 2009 (UTC)[reply]

--ThaddeusB (talk) 01:52, 18 September 2009 (UTC)[reply]

Thank you. I still think we should run some more trials before going ahead, just to be safe. Abyssal (talk) 01:04, 18 September 2009 (UTC)[reply]

I disagree with you, Hesperian. The attitude by ThaddeusB and Abyssal is: they are not responsible for the mistakes the bot makes. I was accused of being "passive aggressive" for failing to parent Abyssal through a correction of an article mistake he made. I don't think this is a team that will clean up after themselves.

The correct response to the problem with the two articles, to show good faith effort toward dealing with future problems with bots, would have been for one of them to correct the articles immediately. But, no, it was more important to call someone names ("passive aggressive) than to make the encyclopedia accurate.

There is no community support for this bot. ThaddeusB is weirdly trying to bully me into being the bot's monitor. If he can't get anyone to check the bot, and he is not able to, and Abyssal won't, and the community isn't interested, why should this bot go forward?

The way to deal with someone who disagrees with something you want and to gain their support is to address their concerns, stay rigorously on target on the issue, and don't tell them they are "passive-aggressive," "throwing hissy fits," "mistaken about how wikipedia operates." All of these comments are personal issues about me. If they are more important than the data, maybe the data aren't that valuable or useful to the encyclopedia.

ThaddeusB and Abyssal have established how they will act already: They will make personal accusations against people raising issues about the bot.

This bot is a disaster in the making because of its operating team. That's my passive aggressive, mistaken-user, can't-raise-substantive-issues, hissy-fitting opinion. --69.225.3.119 (talk) 05:09, 18 September 2009 (UTC)[reply]

If you don't have any concrete objections (and you have yet to offer any), and no actual evidence of how I'll address complaints, then this is just your personal opinion and nothing more. And if you look at my actual record with my actual bots, you will see that I do address actual complaints in a timely manner.

No one is trying to force you to do anything, but you posted here and said you think the bot will screw up but offered no evidence. Of course I am going to respond to that by telling you to check the data if you think it'll mess up. I have personally already checked it & found it to be accurate, but that isn't good enough for you.

Yet again, I can't respond to some theoretical eventual complaint until one actually surfaces. Yet again, do you have an actual complaint with the bot or is this just a philosophical objection and/or personal vendetta?

P.S. Saying someone is mistaken about something isn't a "personal issue" and I take no responsibility for the other two comments, which I didn't make. --ThaddeusB (talk) 13:18, 18 September 2009 (UTC)[reply]

The correct response to the problem with the two articles would have been to tell you to shut up and stay on topic, but I tried to be more diplomatic. I said part of the reason your problems with edits I made (that are irrelevant to ContentCreationBOT's approval) were not addressed is because I get busy and sometimes forget about messages left on my talk page. Further, I said, if you had problems with me not addressing those issues, all you had to do was remind me about them on my talk page. Instead you waited weeks and weeks and only bought the subject up when you could use it to beat me over the head in an unrelated discussion, namely, this one. "Name-calling" or not, I stand by my description of your actions as "passive agressive."

I have little reason to believe that you raised the issue out of legitimate concern because even after you expressed the complaint you were not very helpful in the matter of getting your own problems addressed. No progress was made towards resolving your own issues until Kurt Shaped Box stepped in. Not that any of this matters, because this is the ContentCreationBOT request for approval discussion page, not the "whine about Abyssal making an error that wasn't even entirely his own fault in a tiny article on an obscure genus prehistoric fish" discussion page.

There is no community support for this bot? What? Who should we be asking? The guy who started the List of graptolites? That was me. What about its chief contributor? Me, again. List of prehistoric starfish? That's me as well. List of prehistoric barnacles? Another one by me. Crap! List of crinoid genera? Uh oh, it looks like a pattern is emerging. Turns out I'm both the creator and sole major contributor to every single page that the bot is slated to edit. Every. single. one. If you can find another major contributor to the articles, please do invite them to see if we can form a consensus.

Thad trying to bully you into monitoring the bot? Come on. You claim to be an invertebrate paleontologist and you're on a website based around volunteering to edit encyclopedia articles. So, when we come here with a plan to add a lot of information to encyclopedia articles on prehistoric invertebrates, and then you oppose the addition of information to articles in your field complain that no would be monitoring the data, it's only natural that we stare at you in disbelief. Regarding my willingness to fact-check, considering that I've explicitly called for more fact checking and testing before we proceed with the bot, even though we've both performed successful tests and received tentative approval from another member, your claim that I'm unwilling to check the data rings very hollow.

I'd love to "rigorously" stay on topic, but someone keeps raising issues about stub creation and something to do with a typo in the title of an article I created about a prehistoric fish no one has ever heard of. I'd love to address your very serious objections, but for the life of me I can't remember them. I remember a complaint about a lack of consensus, but when I pointed out that I was the only one who contributed any meaningful content to the articles the bot would edit, and that I both supported and actively solicited the creation of the bot, you ignored me. Other than that, all I remember is a long series of complaints that we weren't taking your complaints seriously enough.

Congratulations. You've cast a dark shadow over the topic and single-handedly discolored the entire discussion. Sadly, the useful input given by of Kurt Shaped Box, Anomie, and Hesperian has gotten somewhat lost in the resulting din. Abyssal (talk)

--69.225.3.119 (talk) 21:23, 18 September 2009 (UTC)[reply]

I see no issues with this bots proposed work, and all the legitimate issues raised have been addressed. Unless the anon IP wishes to raise a useful objection that has to do with this specific bot approval request, I don't see any further issues which need to be addressed. I'm in favor of approving the bot as it currently stands for this test run on the 13 lists given above (and perhaps the additional ones listed below if they are similar enough and the source database contains information which could be added to them). ···日本穣^? · 投稿 · Talk to Nihonjoe 14:18, 23 September 2009 (UTC)[reply]

I've raised issues, and ThaddeusB and Abyssal have played word games and delivered personal insults and criticisms against me as a person. If this is the response to issues before it's running, this is, imo, how they'll respond when it's running: insult the person who raises the issue (personal attacks), play word games (wikilawyering), insult the level of wikipedia knowledge of the person raising the issue (biting the newbie--although I'm not new), and demand that if someone has a problem with the data they should devote their wiki career to monitoring the bot's input.

No. That's my opinion. --69.225.3.119 (talk) 22:45, 23 September 2009 (UTC)[reply]

You have absolutely refused to make any concrete objection that can be addressed. Instead you merely repeat the same line over and over about this bot will obviously screw up because Abyssal and I are bad people.

According to your own words, you have the ability to provide expert advice on the material. Your advice on the data would be appreciated, but apparently all you want to do is criticize others and offer nothing. It's a shame that you want to play petty games ("they tried to bully me into helping, so I won't help") rather than helping to improve Wikipedia. --ThaddeusB (talk) 23:39, 23 September 2009 (UTC)[reply]

I see that despite your fervent insistence that this proposal not go forward, Mr. IP, that my challenge for you to remind us of just one of your many very informed and serious objections continues to go unanswered. Abyssal (talk) 00:11, 24 September 2009 (UTC)[reply]

Given the work generated for volunteers and the risks to Wikipedia's credibility when these bots go badly wrong, in my view there should be almost a standard of proof applied to bots wishing to undertake such work. If it is met, fine, then it goes ahead. But it appears one is not being applied (note I am not blaming present parties for this, I am blaming BAG for making themselves the sole arbiters and granters of such things then failing to take responsibility). This is quite dangerous given the scale of the Anybot fiasco, and some others I have seen at times where semi-automated accounts and bots have gone on mass creation sprees which have had to be deleted. Is it possible to generate a dataset or even a list with the bot (as in, not articles) with all data to be included that some appropriate person with the requisite knowledge can check, and if we can do say 500 or 1,000 of those and they basically work fine then go create the articles as per the original plan. That way we get the articles but they're credible at the end of the process. I believe Abyssal is acting in good faith, the problem is not that but the lack of a checking process by people who know the content area or a meaningful approval process - even I wouldn't feel comfortable applying under such circumstances. Orderinchaos 02:04, 24 September 2009 (UTC)[reply]

Code review

ThaddeusB asked me for a code review, so here it is. Not much to mention, really:

The error checking could use some work. You properly check for HTTP errors, but for API errors in the initial page query ~~or for json decoding errors (i.e. a truncated response)~~ (never mind, from_json just dies on error).
$timestamp2 will not have a value unless you run into a maxlag error when querying the edit token (check rvprop in the first query). That would probably give an error in the action=edit request.
Will it output a period such as "Mid Ashgill to Mid Ashgill"? If so, wouldn't that be better as just "Mid Ashgill"?
It looks like it will screw up the location field if the last entry is not a one-word location name; that may be left over from changing from plain text to a bulleted list. Should the <br /> and the substr($line, 0, -4); line just be removed?
I note that the bot will wipe out whatever content is currently in the tables, even those marked "NoData". This may not matter, as I don't know whether there is any such content currently in the tables for those entries. It also seems that it will die if any of the tables contains a genus not in the database, which is an appropriately safe failure mode.

Anomie ⚔ 16:41, 13 September 2009 (UTC)[reply]

Thank for the help. I fixed all the errors. The "Mid Ashgill to Mid Ashgill" thing was something I meant to correct, but apparently forgot to do. The location thing was indeed left over from changing to a bulleted list. All of the tables are currently blank, so overwriting them isn't an issue. If I needed to rerun the task for some unforeseen reason I'd change the code to be more cautious at that time. --ThaddeusB (talk) 02:55, 15 September 2009 (UTC)[reply]

Not a programmer, so I can't say much, but thanks for reviewing the code. Also, the "Ashgill to Ashgill" thing has been bothering me, too. Is it possible to to remove the duplicate? Abyssal (talk) 15:14, 14 September 2009 (UTC)[reply]

Additional pages

The bot may also be useful on the following pages:

These pages would work well with the bot if put into the table format:

Abyssal (talk) 16:37, 17 September 2009 (UTC)[reply]

Trial?

Is this ready for a trial? Mr.Z-man 00:43, 24 September 2009 (UTC)[reply]

So, this is a 100% dismissal of all objections to the bot? Why? --69.225.3.119 (talk) 01:04, 24 September 2009 (UTC)[reply]

Oh, so this is what your bad faith-filled tirade on my talk page and the village pump is about. No, its a request for someone to summarize the huge amount of text above. If it was a dismissal of concerns, I probably would have actually done something other than ask a simple question. Mr.Z-man 01:45, 24 September 2009 (UTC)[reply]

No, no, it's a hissy fit, not a tirade. And, if you had read the discussion you would have known that.

You asked elsewhere if there was a summary for another bot. Here, you don't ask for a summary, you ask if it's ready for trial. This implies that the next step is a trial. --69.225.3.119 (talk) 01:54, 24 September 2009 (UTC)[reply]

My apologies for not being perfectly consistent. That's what you get with unpaid labor. Mr.Z-man 01:59, 24 September 2009 (UTC)[reply]

Sorry, for the bold, but I want to point out that successful trials have already been run. A link to the results of the test are here. Abyssal (talk) 15:10, 24 September 2009 (UTC)[reply]

It would seem from reading the text that it is not yet ready for a trial. Orderinchaos 01:56, 24 September 2009 (UTC)[reply]

On what basis? I will be happy to address any actual concerns but the only objection to date is 69.225's philosophical objection that 1) bots shouldn't do this sort of task unless every scrap of data is pre-approved and 2) I am a bad person who won't address concerns when they are raised. --ThaddeusB (talk) 02:00, 24 September 2009 (UTC)[reply]

I raised issues. I think that bots should be used for content creation. You missed my anybot arguments. I was the editor in support of bots being used for content creation. I think User:Hesperian was against it. And, ThaddeusB, I think you're the one telling me I'm having hissy fits, I'm passive-aggressive, and now this ridiculous comment. You can continue to ignore all of my issues and call them invalid. But, my issues stand. --69.225.3.119 (talk) 02:15, 24 September 2009 (UTC)[reply]

I obviously am too stupid to get your objections because I have told you 10 times I don't see any that aren't "there is no consensus" or "you must check every fact manually" (neither of which I can address because they are both your opinion only). I have requested be specific 10 times and you have ignored me 10 times, so who exactly is ignoring who? Oh and for the record a didn't make a single one of those comments you attributed to me, so I would appreciate it if you strike that part of the comment. --ThaddeusB (talk) 02:24, 24 September 2009 (UTC)[reply]

Yes, of course, if I say something is a problem, and you say it ain't, it must not be an issue. Oh, I'm sorry, are they Abyssal's comments? Well, still, you're an administrator, you've approved this discussion being held while one party is being personally attacked. So, no, I won't strike it. I will render a correction: the personal attacks and insults are above. They were issued by Abyssal without any desire to see them stricken by this administrator.

Call my issues what you want. Dismiss them. Allow others to call me names. I raised objections. You still choose to ignore them. If you choose to ignore my issues and then claim I'm ignoring you, that's just a game. I will continue to not play your game, while you dismiss my issues. No community consensus. Bot operator aggressively ignoring issues, encouraging personal attacks. --69.225.3.119 (talk) 02:29, 24 September 2009 (UTC)[reply]

Yes, I ignored Abyssal's personal comments. That doesn't mean I approved of them. I also ignored various personal comments you directed at me and Abyssal. That doesn't mean I approved of them either. --ThaddeusB (talk) 02:40, 24 September 2009 (UTC)[reply]

Please also note, that while some of the things I've said were a bit uncivil, and maybe even in poor taste, I only said them after the anonymous IP had displayed a pattern of rudeness and condescension both here and on my talk page that potentially stretched back months. I would hereby like to apologize for anything inappropriate I've said thusfar. Abyssal (talk) 15:23, 24 September 2009 (UTC)[reply]

Ive reviewed the discussion and the anon's complaints are for the most part invalid. the only issue that I see is limited input from the affected wikiprojects. something that we cannot force, the only thing I can suggest is make a post to ANI and see if that brings in a wider group for input. otherwise I thing a small trial would be useful. β^_command 02:05, 24 September 2009 (UTC)[reply]

My issues are valid. If you can only claim they aren't without naming them and identifying their invalidity, just to parrot Thaddeus, you haven't increased the support for this bot. It doesn't matter if the projects don't support it. Fact is, Thaddeus hasn't gotten anyone who supports the creation of data in paleontology tables. --69.225.3.119 (talk) 02:15, 24 September 2009 (UTC)[reply]
- Ok let me explain the facts to you since you obviously ignored everyone elses comments. The information in question is coming from a very reliable source. The bot operator is not creating articles, just expanding a few lists. The operator has spot checked and confirmed that the data in question is reliable. there are several other uses who work in related areas have confirmed that the information is correct, and the programming of the bot is accurate so that there will be no issues with the imported information. The only real complaint that I see left is the fact that you did not like what anybot's method of operation. on that point we agree. this bot however will not be creating articles but rather filling in tables from a reliable database on existing articles. Please stop trying to raise the drama level. if you have any issues that have not been addressed the best method is a numbered list with a short explanation. β^_command 02:26, 24 September 2009 (UTC)[reply]
  - No, I didn't ignore it, and others won't get it either because the credits for the information are incorrect. I understand the bot is putting data into lists (thanks for not reading my comments, but saying you've read everyone else's instead). The bot operator is not a vertebrate paleontologist. He hasn't confirmed the reliability of the data. What other users have confirmed? The ultimate purpose of "ContentCreationBot" is to create content.
    The drama level? Ignored, called names, personally attacked gets drama. Don't issue personal attacks, don't dismiss my legitimate complaints. No drama. And, stay on target. That helps no drama also. --69.225.3.119 (talk) 02:32, 24 September 2009 (UTC)[reply]
    - I see no proof other than your word that your anything more than a 13 year old child who is attempting to make a point by forum shopping, leaving uncivil comments, and attacking others. The information that the bot is adding is reliable. you have yet to prove otherwise. So unless you can actually make a logical statement and prove that the content and database the bot will be using is wrong (besides a few typos) I see no reason for your behavior. β^_command 02:39, 24 September 2009 (UTC)[reply]
      - Ahem! Enough, Betacommand! You've been warned about this before. This is not the way for you to interact with people. Stick to the actual 'bot issue at hand (Goodness knows! There's been enough diversion from the core focus of the discussion, already.), and do not give us your guesses about who participants in the discussion may be. Uncle G (talk) 03:43, 24 September 2009 (UTC)[reply]
    - I am not asking for approval to create stubs at this time, so the bot's "ultimate purpose" is irrelevant to this BRFA. --ThaddeusB (talk) 02:38, 24 September 2009 (UTC)[reply]

Doesn't matter, since I'm a hissy-fitting, drama mongoring, passive-aggressive 13-year-old. I suggest you now delete all the anybot articles I saved, because you don't really want a hissy-fitting, drama mongoring, passive-aggressive, 13-year-old writing articles. That's a good one, though, since I'm 13 I'm incompetent. I missed the age limit earlier. My bad. --69.225.3.119 (talk) 02:46, 24 September 2009 (UTC)[reply]

No one has criticized or doubted the quality of your work outside the BRFA or how valuable you are to Wikipedia as an editor. My issues, and so far every issue I've seen raised against you has regarded your conduct here. Please stop trying to play the victim here. Abyssal (talk) 15:17, 24 September 2009 (UTC)[reply]

No need for approval

A "bot" that runs one time only on 23 pages is not distinguishable from a human editor and does not require approval. In fact, you could generate the content for the 23 pages on your local computer and then save them by hand using your usual account. So there is not really need for a discussion here at all. Just do the edits, and then discuss them on the talk pages of the articles involved like you would discuss any other edits. — Carl (CBM · talk) 02:54, 24 September 2009 (UTC)[reply]

Nonetheless, ThaddeusB should be applauded for seeking approval anyway, so that experts can review the data sources, and we don't repeat history. Learning from history, and acting upon that learning, is a good thing. In all of the above, it seems that none of the self-declared subject experts have provided the necessary evaluation of the data source. Uncle G (talk) 03:33, 24 September 2009 (UTC)[reply]
- I agree with both comments here. The actual edits could very well be made by a human; it may have even been easier to leave a note on a few WikiProject's talk pages and discuss it there, but no matter. This will have to do as an alternative outlet for discussion about the task. Easily noticed by reading the discussion above is the fact that this has morphed into less of a conversation about the actual edits the bot will make, and more of an unconstructive argument between various parties. I just hope we can return to discussing the task itself. Regards, The Earwig (Talk | Contribs) 03:54, 24 September 2009 (UTC)[reply]
  - That's the issue. When we are talking about 23 edits total, which is the current scope of this request, there is nothing for BAG to review. The edits themselves can be reviewed so easily that its counterproductive to spend too long trying to do a technical review of the code. Moreover, this forum is not ideal for discussion of content issues, as the discussion above painfully highlights. A discussion on a wikiproject page would be much more likely to be productive. BAG is not intended to review the quality of data sources. — Carl (CBM · talk) 10:46, 24 September 2009 (UTC)[reply]
Please not that I have said on several occasions that I was aware approval wasn't actually required. The reason I sought approval was to make it explicit where the content came from (I bot pulling RS data). I could have just done the actual edits manually and avoid the drama. In retrospect, maybe that would have been best. However, I will say this process has resulted in some improvements to the output, so it wasn't a total waste. --ThaddeusB (talk) 12:58, 24 September 2009 (UTC)[reply]

Note: I will be modifying the code to fix the problem outlined here. I thank 69.255 for pointing out this error, and kindly ask him/her to restrict future comments to indicate specific problems that can be addressed. --ThaddeusB (talk) 01:04, 25 September 2009 (UTC)[reply]

For the record, here is my reply to the issue which the IP reverted as "taunting" --ThaddeusB (talk) 15:24, 25 September 2009 (UTC)[reply]

Could you explain, in layman's terms, how that error occurred? Is this a widespread problem, affecting a significant percentage of the trial run output? If so, I suggest that a fresh trial run be carried out following the fix. I'm not going to comment on the drama that occurred over the last day or so, of which I was completely oblivious to until now - save to say that we should probably just attempt to put it in the past and move on from it. --Kurt Shaped Box (talk) 02:04, 25 September 2009 (UTC)[reply]

The bot fills the "Age" column based on the fossil record information returned by Paleodb. In this case, the fossil record says 14ma to 4ma; however, the genus is actually extant and thus obviously the fossil record is insufficient. Since over 90% of the db is extinct, this particular problem should be quite rare. However, it does raise a question about the accuracy of using the fossil record in general.

I used the fossil record to estimate ages because I feel this is, in general, the best estimate available. However, the fossil record is very much incomplete (and paleo's db is far from a complete record of the known fossil record either) - a fact which is not obvious to the casual observer. As such, I will address this concern via the following adjustments:

Any genus that is extant will get "present" as the end date regardless of the fossil record
The column will be renamed to "estimated time period" (or alternate upon suggestion)
A footnote disclaimer will be added to state the estimates are based off the fossil record, which by nature makes them imperfect.

I am also open to suggestions about alternative sources for age range estimates.

Once the code is adjusted, I will re-upload the demo page. (I suggest every one use this terminology to describe that page, as it isn't a "trial" in the BAg sense of the word.) --ThaddeusB (talk) 03:00, 25 September 2009 (UTC)[reply]

There is no significant percentage of the output. There are only 23 pages, so even 100% would be an insignificant percentage. That's why the whole idea of "trial runs" is flawed in this case. — Carl (CBM · talk) 02:25, 25 September 2009 (UTC)[reply]

You may have missed it, but in the original requested (way up there somewhere :)) I suggested having the bot fill in only a small percentage of each table for the trial rather than the normal "X pages" approach. --ThaddeusB (talk) 03:00, 25 September 2009 (UTC)[reply]

As long as it fits with the size limits, the size of the edit really isn't the issue. A "trial" is warranted when the bot is going to make a lot of actions, so that it would be painful to have to undo them all. The expectation is that the bot operator will carefully review every edit in the trial to make sure there are no technical problems. The size of the edits during the trial is entirely up to the bot operator. The difficulty here is that this project, regardless of any other merits it might have, simply doesn't fit into the framework for approving bots. — Carl (CBM · talk) 10:55, 25 September 2009 (UTC)[reply]

I am well aware of the "rules", including WP:IAR. I thought it would be beneficial to get approval for the bot, even though none was technically required, as thus I filled the request. Can we please stop the wikilawyering now? --ThaddeusB (talk) 15:24, 25 September 2009 (UTC)[reply]

I'm saying there are more appropriate forms of review than BAG for this task (for example, the talk pages of the articles involved, or a wikiproject page). As you say below, you are not even looking for a bot flag — so why create a "bot" account, for a task that has none of the attributes of a bot? I wanted to bring up this point in case other people see this nomination and mistakenly think it represents our best practice for when to ask for bot approval. — Carl (CBM · talk) 02:20, 26 September 2009 (UTC)[reply]

Fair enough, thanks for clarifying. --ThaddeusB (talk) 03:05, 26 September 2009 (UTC)[reply]

Chaunax

69.225.5.4 directs my attention to the original version of the article Chaunax, posted by Abyssal back in May. I must say I find this a disappointingly poor effort at a cookie-cutter stub. Problems include

Unsubstituted {{pagename}} templates;
Omission of class, order and family from the taxobox;
Specifying (but leaving blank) taxobox parameters that are inappropriate for a genus article, such as "binomial";
The absense of a fossil range, a piece of information that I would have thought was critical to the decision to post a stub like this;
The incorrect claim that it is extinct;
The redundancy of referring to it as both "extinct" and "prehistoric" in the same sentence. "Prehistoric" implies "extinct"; extant genera that are present in the fossil record are never referred to as prehistoric.
The absence of references.

Naturally the false claim that the genus is extinct is the biggest problem.

This sums up the problem with content creation bots. (1) They introduce errors; and (2) even when they don't introduce errors, they produce clunky, incomplete, redundant articles that utterly fail to communicate in an interesting or informative manner.

I don't want to beat a straw man here; but it does seem reasonable to assume that the purpose of ContentCreationBOT is to enable the creation of these dreadful cookie-cutter stubs on a grand scale. Am I wrong? If so, what steps have or will be taken to ensure that the content produced is more useful than the example above?

Hesperian 12:16, 25 September 2009 (UTC)[reply]

Thad and I have prepared at least a tentative template that is much more thorough than my "cookie cutter stubs." Further input and ideas are appreciated.

The possibility of introducing inaccurate claims to Wikipedia is a real one, and could come from two sources. 1, faulty information in the database and 2, the bot mishandling the data. Source one is unlikely, since the database is maintained by experts. Source two is preventable and is the reason we'll have to do test runs. We recognize the possibility of things going very much awry, and we have always intended to proceed cautiously. It's one of the reasons we decided to try the data-table filling process- to prove the bot could handle the data properly.

I don't claim to have made good stubs, but surely you wouldn't suggest that a very short "substub" is worse than having no article at all? If all the stub did was set up the basic framework for the article, then it still would justify their creation. Say you wanted to create the Chaunax article. First you'd have to go to another animal article, copy the taxobox, replace the data, etc. Then you'd have to find out the best stub template to use and add it. Then add the name, portal templates, links, etc. Let's just say that it takes three minutes of time to do that. Now, if every prehistoric fish was done manually, the community would have to spend three minutes for every article. Lets say I made 500 stubs the cookie-cutter way. Because I'm just copy-pasting, article creation time is instantaneous. Therefore, if I had created 500 practically instantaneously, I had saved the community the approximate 25 man-hours worth of work. It's the same idea with the bot. If the bot creates 5,000 stubs (which would all be much higher in quality than the one I made, see the template), the amount of work saved would be 250 man-hours even if all it had done was build the basic set-up of the article.

I'm not going to respond to specific criticisms of the Chaunax article, not because they aren't valid (they were, although I'd still personally refer to extant ancient taxa as prehistoric), but because the bot-generated stubs will be so much better in quality than my "cookie cutter" types that they aren't really relevant.

To answer your last question, we do intend to create large numbers of relatively high-quality stubs eventually however this particular discussion is supposed to be only about the data-table completion. We will start another Request for Approval when we feel that we're more prepared to handle the much larger task of stub-creation. Thanks for the input! Abyssal (talk) 15:51, 25 September 2009 (UTC)[reply]

"I don't want to beat a straw man here" well that is exactly what you are doing.

1) The "horrible stub" wasn't created via any sort of automation but rather by hand by Abyssal

2) I am not asking for approval to create stubs at this time --ThaddeusB (talk) 15:26, 25 September 2009 (UTC)[reply]

I did ask "Am I wrong? If so, what steps have or will be taken...?" You haven't answered that. Abyssal has, but the link he has provided, User:ThaddeusB/PAC template, fails to re-assure me.

Re (2), If I posted a request here that said "I am requesting permission to scratch my backside... but later I might want to deploy my bot to correct spelling errors... but right now I am only requesting permission to scratch my backside", I'm pretty sure discussion would centre on my proposal to deploy a bot to correct spelling errors. This is only natural. Hesperian 05:52, 26 September 2009 (UTC)[reply]

Considering I am not asking for approval to make stubs, am not ready to make them, and if when\I am, I would most certainly have to post a new request to do so it is completely reasonable for me not to want to debate them here. --ThaddeusB (talk) 15:11, 26 September 2009 (UTC)[reply]

It's an example of how articles created by people without knowledge in the subject area aren't useful. I had corrected a few hundred of Abyssal's fish stubs, making them more useful by adding class, when I was interrupted by the Anybot mess. This article problem is relevant to this discussion because the article was added by Abyssal, who is strongly advocating for this bot and worked with ThaddeusB on creating the bot.

Adding 10,000 pieces of data to 23 pages is worse than not having the data, when those adding the data are not reading the database correctly (see Abyssal's sample "successful" upload above) and admit (above) they don't have the necessary expertise to read the database correctly. If wikipedia editors don't know if the data are correct, they do not belong on wikipedia for any amount of time. They do not belong uploaded by a bot or by a human.

As I said early on, until this bot has experts (whether paleontologists or wikipedia enthusiasts on the taxa) on board, its task is inappropriate. It is not supported by the community. The community is not asking for unvetted data to be uploaded. Abyssal and Thaddeus don't know the data, can't tell when it's incorrect, and they don't act quickly when they create articles that are incorrect. --69.225.5.4 (talk) 20:26, 25 September 2009 (UTC)[reply]

I think also that Abyssal's comment that the test run can prove the bot can handle the data correctly should be remarked upon, because, what the test run did was prove exactly my problem with this bot: it doesn't matter if the bot can handle the data correctly when there is no one available who can vet the data. --69.225.5.4 (talk) 20:37, 25 September 2009 (UTC)[reply]

The fish articles aren't relevant, no matter how hard you insist that they are. You might as well choose any random project I've engaged in on Wikipedia. The fish articles will not reflect the quality of the created stubs because we are using a different template for the article design. They were not created by the same process that will be used here. We are not even supposed to be discussing the planned stub creation process here. Also your language is misleading. If you were just adding class information, that's not "correcting," that's just "adding."

We can read the database just fine. It doesn't matter if we understand the content, it's just a matter of making sure the content added to article is the same as is in the database. If the generated article on Abyssalgenus says its a member of the Thadidae while the PBDB says it's a Kurtboxid, then we know an error has been made regardless as to whether we understand the basics of either taxon's anatomy/classification/lifestyle/etc. The only skills needed to ensure the validity of the final result is the ability to compare the data in the article to the data listed in the database, either the words will be exactly the same or an error will have occurred. Expertise is irrelevant.

What do you mean, implying that the bot handled the data correctly? The bot didn't handle the data correctly, it failed to verify the "age range" information for Cryptoplax with the "basic information" data in the first tab. That mishandling was supposedly the basis of the complaint you raised yesterday on your talk page. It has nothing to do with mine or Thad's ability to read the database. Even as a non-programmer I can see an easy solution for this: use the Sepkoski age range data for extant species and the PBDB "age range" information solely for extinct taxa. Had we forseen that the problem would have never occurred, but you can't forsee everything which is why we ran the test in the first place. Abyssal (talk) 23:34, 25 September 2009 (UTC)[reply]

You're the one who said you ran the trial to prove the bot can handle the data correctly, see your above post.

"It doesn't matter if we understand the content, it's just a matter of making sure the content added to article is the same as is in the database."

Yes, it matters if you understand the content. You didn't, so you posted a "successful" trial that included wrong information. The information in the database was correct. It still is correct. It lists the species as Late Miocene to recent. --69.225.5.4 (talk) 00:08, 26 September 2009 (UTC)[reply]

I thought you were going to stay on topic? I guess that was either an empty promise or you are completely incapable or unwilling to do so.

At least a dozen times you have said Abyssal and I have admitted to having no knowledge about the subject. That is entirely 100% untrue. I have stated I am not an expert, but that is not the same thing as "having no clue." Abyssal has never said anything at all about not having knowledge of the subject and indeed he has contributed more to paleobiology on Wikipedia than nearly anyone else.

Abyssal has created more than 1000 articles on prehistoric genus. You have to date found 3 that contained errors. Wow, a human with only a 99.97% accuracy rate must clearly be a complete fool who doesn't know a thing about the subject matter. Right?

Every post you make is a half-truth or distortion of the facts. You repeatedly make insulting claims like I ignore all feedback, or that I asked for you to be blocked, that have absolutely no basis in fact. You claim to be an expert yet you refuse to provide a single concrete criticism until after 3 weeks of bickering and a block for being disruptive. You claim you want a bot to supply this data, but your actions say otherwise. Tell me, what is your real motivation here? If you want to help, than please do so. If you want to argue, than please go someplace else.

I 100% absolutely want every shred of data produced by this bot to be accurate. I will listen to any concrete complaints you or anyone else has, and will fix any actual problems that are identified. However, your standard some magic human that can review 10000 items and instantly knew item 245 is an error is impossible to meet. There is not one person on the entire planet with the ability to do what you demand. Anybot was a horrible POS, but I had absolutely nothing to do with that, so please stop taking out your rightful hatred of that bot on me. --ThaddeusB (talk) 23:37, 25 September 2009 (UTC)[reply]

Take the personal comments elsewhere, ThaddeusB. --69.225.5.4 (talk) 00:08, 26 September 2009 (UTC)[reply]

LOL, your refusal to be truthful is directly relevant to this conversation. You can't just imply I'm incompetent, post outright lies about what I & Abyssal have said previously, claim I never listen to people, and ignore 90% of everything that is said and harp on the 10% that looks bad, and not expect me to comment on it. You hate the very idea of this bot (despite your claims to the contrary) and your actions make it quite clear you are interested only in derailing the bot, not in making it work correctly. --ThaddeusB (talk) 00:16, 26 September 2009 (UTC)[reply]

Note: after a productive conversation with 69.225, I have sent out requests for more expert input. --ThaddeusB (talk) 03:40, 26 September 2009 (UTC)[reply]

Let it drop

Considering you're only proposing to make 30-odd edits, which you're entirely capable of doing through your user account; and considering this request has been utterly derailed, for better or for worse; it seems to me that the best way forward for you is to let this request drop, and go ahead and produce these lists, if you still want to. If and when the time comes that you want to do something that actually requires approval, a fresh start to this approval process would be useful for everyone concerned. By that time you will have learned from the experience of posting these lists, and you'll go into the approval process knowing that some of us are still smarting from the last debacle, and need concrete reassurance. Hesperian 05:52, 26 September 2009 (UTC)[reply]

I still think it is more appropriate to make bot edits under a bot account rather than hiding them under my own user name. If I had just put them under my own name to begin with, of course, none of this would have ever happened, but I really don't think that would have been better. For example, no one would have questioned anything and whatever errors that might have occured most likely would never have been caught. --ThaddeusB (talk) 15:05, 26 September 2009 (UTC)[reply]

I agree with Thad. Despite the drama we did get feedback that saved us from very serious errors. Abyssal (talk) 16:16, 26 September 2009 (UTC)[reply]

Don't "hide bot edits under your own name". Make human edits. There are only 30 of them. Hesperian 06:47, 27 September 2009 (UTC)[reply]

I meant hide the fact that table was generated by a script - I didn't mean make the script upload them under my name. --ThaddeusB (talk) 16:25, 27 September 2009 (UTC)[reply]

There is no reason to worry about whether the content of a single edit is generated by a script or not. If the content of the edit is good, it doesn't matter if a script made it, and if the content is flawed, it also don't matter. I use a scripts somewhat often to create citation templates, for example, but there is no reason I need to indicate that in the edit summary. Indeed, it would even be valid in this case to let the script upload the content in your name, provided that you review the content yourself. — Carl (CBM · talk) 03:21, 30 September 2009 (UTC)[reply]

Bot flag?

Are you asking for a bot flag? There seems to be no need for one. --Apoc2400 (talk) 21:15, 25 September 2009 (UTC)[reply]

If it doesn't get one, that is fine by me. --ThaddeusB (talk) 23:37, 25 September 2009 (UTC)[reply]

FWIW, I would support a bot flag if one was given. I agree that one is not necessary for this test, however. ···日本穣^? · 投稿 · Talk to Nihonjoe 06:46, 30 September 2009 (UTC)[reply]

Task approval

Let's just approve this task! It's a handful of edits, and its good that it's been through the process. Or approve a trial run of 25 edits... They can be reverted and re-run if there are any problems. Rich Farmbrough, 01:56, 13 October 2009 (UTC). {{BAGAssistanceNeeded}}[reply]

It is currently stalled probably because of some issues that arose from expert input about one of the lists. It's only a handful of edits, but each edit is hundreds of lines in a table, for a total of thousands of lines of information.

As there is an issue about the validity of the genera in the lists that should be addressed first, there's no point in pushing the bot operator to get the bot going to create data that will be mirrored and is incorrect.

Which reminds me, I have to delete a made up organization from an article that shows up in 77 google hits, all wiki mirrors. It's much politer to not do this in the first place, meaning not create articles with faulty data to begin with, rather than go and correct them after the fact. There's no hurry here. --69.225.5.4 (talk) 04:53, 13 October 2009 (UTC)[reply]

Fix the bug won't be difficult, but I haven't had a chance to do it yet because I've been busy with more pressing tasks. Once I've fixed it, I'll update here. --ThaddeusB (talk) 14:33, 13 October 2009 (UTC)[reply]

Note: Before we actually run this thin we'll have to recreate the table for the List of prehistoric barnacles. A user has insisted it be removed until it can be filled. Abyssal (talk) 03:58, 4 December 2009 (UTC)[reply]

Starting next week, I'm on real-life vacation, so I should have a lot more on wiki time. I should be able to make the necessary changes to the bot, re-run the test, and seek appropriate input then. --ThaddeusB (talk) 04:20, 4 December 2009 (UTC)[reply]

Awesome. Abyssal (talk) 04:35, 4 December 2009 (UTC)[reply]

{{OperatorAssistanceNeeded}} News? MBisanz ^talk 01:26, 3 January 2010 (UTC)[reply]

I'll be expiring this soon if it isn't going to run. MBisanz ^talk 05:46, 10 January 2010 (UTC)[reply]

I just returned to Wikipedia after an unexpected month long absence. Will look into fixing/re-running this within the next few days. --ThaddeusB (talk) 05:11, 15 January 2010 (UTC)[reply]

Awesome. Ale_Jrb ^talk 20:57, 17 January 2010 (UTC)[reply]

THanks for the update. MBisanz ^talk 22:35, 29 January 2010 (UTC)[reply]

If everything is all fixed up and ready to run, let's see a

Approved for trial (5 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. when you get a chance. (X! · talk) · @860 · 19:38, 3 February 2010 (UTC)[reply]

A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) News? Josh Parris 06:33, 23 February 2010 (UTC)[reply]

This BRFA has been open for far too long. I will expire this soon if the operator does not start the trial. — The Earwig ^(talk) 02:17, 4 March 2010 (UTC)[reply]

Don't you think that's a bit unfair considering our original attempt to get this approved was obstructed by a troll for so long that the project pretty much lost steam? Abyssal (talk) 13:22, 4 March 2010 (UTC)[reply]

Examining User_talk:ThaddeusB it appears the bot lied. Please wait while I try to raise the operator... Josh Parris 13:54, 4 March 2010 (UTC)[reply]

Abyssal, actually, I do not think it's unfair. While the bot started out with some obstruction, keep in mind it has been in trial without any obstructive comments for over a month, and the operator has been active during that time. I don't see the problem with expiring it if the operator does not respond or start the trial. And by the way, Josh Parris, while the bot didn't notify ThaddeusB when you posted the operator assistance request, I had notified him three days earlier, so the operator has been sufficiently notified in my opinion. — The Earwig ^(talk) 02:50, 9 March 2010 (UTC)[reply]

No operator attention for a month. Request Expired. Josh Parris 03:00, 9 March 2010 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.