Wikipedia:Bots/Requests for approval/Ganeshbot 10

The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

Approved.

Ganeshbot 10

Operator: Ganeshk (talk · contribs)

Time filed: 01:27, Thursday August 18, 2011 (UTC)

Automatic or Manual: Automatic supervised

Programming language(s): AutoWikiBrowser and CSVLoader plugin

Source code available: Yes

Function overview: The function of the bot is to automate basic stub creation for sea snails and slugs. This will be done under the supervision from the Gastropod project. The stubs will use the World Register of Marine Species as reference.

Links to relevant discussions (where appropriate):

Edit period(s): One time run for each approved task

Estimated number of pages affected: 100 stubs per run

Exclusion compliant (Y/N): Yes

Already has a bot flag (Y/N): Yes

Prior disclosure: This is a revised version of the task detailed on the Ganeshbot 4 and Ganeshbot 5 pages. On Ganeshbot 4, I had mistakenly taken the bot approval for a single genus as a blanket approval for creating any number of stubs. I had stopped creating the stubs immediately after being notified of the problem. That was back in August 2010. Ganeshbot 5 was a request to allow the bot to create unlimited number of species and genus stubs with the oversight from the Gastoprod project. The request was rejected by BAG after a thorough review of the task. I have taken the lessons learnt from the previous bot request and have tried to come up with a request that satisfies both the community and the BAG requirements. This request has the safeguards built-in to ensure the correctness of stub articles.

Function details: The bot will create basic stubs for snails and slugs under the supervision from the Gastropod project. The bot has already created 15,000 stubs. They are still many snails and slug families that are missing articles on Wikipedia. These starter articles will help others expand the information that is available. The articles will be sourced from the World Register of Marine Species database which is considered to be reliable by the project members. The bot will create both species and genus stubs. Alvania is an example for a genus. Alvania aartseni is an example for a species.

Step 1 - Project request

The project will request the bot to create species and genus stubs in a gastropod family or genus. This will done on the Gastropod project talk page under a new heading. The request will include the family/genus name and the intro sentence. Here is an example:

Rissoidae

.... is a species of minute sea snail, a marine gastropod mollusk or micromollusk in the family Rissoidae.

The request will need to be reviewed and approved by a project member other than the requester. Approved

Step 2 - Bot run

The bot operator will run an offline program to download species data for the current genus/family from the WoRMS database. This offline program uses web services to extract the data. The extracted data will be used as the input for the CSVloader program.

In preparation for the bot run, the intro sentence on the generic template template will be replaced with the project approved introduction for the genus/family.

Using the AWB and CSVLoader plugin, the bot will create the first 100 stubs for the current family/genus.

The bot will list the species/genus page names on a user page for the project review. Here is an example, User:Ganeshbot/Animalia/History/Rissoidae.

Step 3 - Project review

A project member will review each of the 100 stubs and mark them as approved. Approved

Step 4 - Pending stubs

If there are species remaining to be created, the bot will create the next 100 stubs and wait for project review (steps 2 and 3). This will continue until all the species/genus under the family/genus have been created.

Step 5 - Ongoing verification

Periodically (once per quarter) the bot compares the stubs with the WoRMS database to check if they are still valid. The results are stored on the Wikipedia:WikiProject Gastropods/Unaccepted page. The page lists species that are considered invalid or synonym of another species. Project members review this page and update the Wikipedia pages with the correct status.

If this task is approved, as a sample task, the bot can create the genus, Turbonilla and all the species under it. See WoRMS.

Discussion

For something of this nature, considering past controversies with stub-creating bots, it is my opinion that links to relevant discussions is mandatory and broader (beyond Project) community input must be sought before requesting bot approval.

How many stubs overall, not just in groups of 100, but are we talking 10,000 or 100,000 or 1,000,000 stub gastropod articles at some point?

Some good prep work on having articles reviewed at different stages by knowledgeable and interested editors, though. --72.208.2.14 (talk) 14:14, 19 August 2011 (UTC)[reply]

I have added the links above. I had already requested community input as part of the previous bot request. The response was positive. I cannot put the extact number, there may be 5,000 - 15,000 stubs remaining to be created. Thanks. — Ganeshk (talk) 23:03, 19 August 2011 (UTC)[reply]

I would like to see community input for this bot request, not for past bot requests. If this is the same bot request, then you don't need to repost the bot request. If it's a different bot request, please seek community input for this bot request. --72.208.2.14 (talk) 14:56, 20 August 2011 (UTC)[reply]

With all due respect, I would like to wait for a BAG member to tell me that another round of community input is a requirement for this bot request. — Ganeshk (talk) 15:23, 20 August 2011 (UTC)[reply]

You can, of course, do whatever you want. Your lack of willingness to discuss this bot creating up to 15,000 stubs with the community is not a good sign, in my opinion. Wikipedia community members expressed concerns in prior stub-creation bot RFBAs and in prior discussions about your failure to follow direction(s). Being proactive about community involvement in running this bot would have been a good sign, again, my opinion. --72.208.2.14 (talk) 23:25, 20 August 2011 (UTC)[reply]

Note: I have requested for this request to be featured as a sidebar news on the Wikipedia community newspaper. That will make this request visible to the wider community. — Ganeshk (talk) 01:17, 21 August 2011 (UTC)[reply]

That sounds like a good way to get community input. In my opinion you have addressed most of what I see as issues with bots creating stubs by having humans request the stubs, verify the database at a general level and check the stubs after creation. --72.208.2.14 (talk) 06:20, 21 August 2011 (UTC)[reply]

72.208.2.14, Thanks for your input. — Ganeshk (talk) 12:11, 21 August 2011 (UTC)[reply]

To the I.P., if the bot creates draft articles in non-mainspace (say hosted somewhere on WikiProject Gastropod), and waits for human reviews, then there's really no need for some huge community discussion on this.

To Ganeshk, how exactly would the reviewing mechanism work? Headbomb {talk / contribs / physics / books} 06:26, 21 August 2011 (UTC)[reply]

Headbomb, the bot will create the stubs in the article namespace. The last 100 stubs created will be listed on a sub page of the User:Ganeshbot/Animalia/History. A project member will review the stubs and add the Approved template once at the end of the list. See example. — Ganeshk (talk) 12:11, 21 August 2011 (UTC)[reply]

So what exactly does the reviewer review? I'm really no sure I'm following how the bot would work... In my mind, this is what you proposed (and plese correct me if I got it wrong)

Genus list is compiled
Bot does some crunching and creates drafts
Users review drafts, gives thumbs up/down on each of them
Upon thumbs up, draft is moved to mainspace

Headbomb {talk / contribs / physics / books} 13:26, 21 August 2011 (UTC)[reply]

The reviewer will check if the taxonomy and categorization is correct. These will be the variables on the generic template. The fields that are enclosed in ## symbols are the ones that will be populated from the CSV file.

Here is the process:

The source data is extracted from WoRMS into a CSV file.
The bot will use the CSV file and the generic template to create articles in the mainspace. This will done using the CSVLoader plugin on AWB.
The articles in the mainspace will be reviewed by a project member. If there are any issues found, they will be addressed before any new articles are created.
If no issues are found, the bot will create the next 100 stubs and wait for review. This will continue until all the remaining species are created.

The bot creates good articles; you can review the 15,000 articles that have been created so far. I don't see the need to create the stubs in a namespace other than the mainspace. That will require an additional move operation by a project member. The intent is to avoid monotonous work for humans (the manual review will be tedious as it is). --— Ganeshk (talk) 14:47, 21 August 2011 (UTC)[reply]

I personally would prefer if the bot created the articles in project space; I understand that this requires more work on the part of the project members, however this adds an extra level of human review, as well as the possibility for editors to add more information than the bot provides (the articles I spot-checked all had "empty section" cleanup tags in two sections; the only true content was the boilerplace sentence such as the example you give for the project request, and the infobox). This also avoids the issue of having to mass delete a bunch of articles if something does go wrong with the database. Hersfold ^(t/a/c) 21:36, 21 August 2011 (UTC)[reply]

Hersfold, I appreciate your concern, but I think you are being over-cautious. The bot has created 15,000 articles so far; it has not had any major errors. There are times when some articles needed deletion because the classification on WoRMS was incorrect and the status was changed to deleted. These are few and far between. The editors will add content when additional information becomes available. — Ganeshk (talk) 21:51, 21 August 2011 (UTC)[reply]

Understood, but this is nonetheless my opinion. I'd appreciate other BAG or community comments on the matter, though. Hersfold ^(t/a/c) 21:56, 21 August 2011 (UTC)[reply]

Let's not put "This section is empty" templates in empty sections on stubs. Just omit the sections, please. --72.208.2.14 (talk) 01:09, 22 August 2011 (UTC)[reply]

Sure, can do that. Would it be okay if I leave the section headings in? As in Alvania aartseni. — Ganeshk (talk) 01:13, 22 August 2011 (UTC)[reply]

They offer the reader nothing. They make the article ugly. You've designed an article only for editing, no consideration of the reader, in my opinion. What purpose do the section headings serve, other than as placeholders for possibly years? --72.208.2.14 (talk) 10:09, 22 August 2011 (UTC)[reply]

The sections were providing a structure to the article and inviting the new editors to expand the article. It is not a problem. I have removed the empty sections from the generic template. — Ganeshk (talk) 23:31, 22 August 2011 (UTC)[reply]

Strong objection. There are several serious problems with this proposal:

You can only automatically create an article once. After the article is created, it is virtually impossible to automatically add body content from other databases without running into problems due to formatting inconsistencies. Thus the only time we should ever, ever consider automatic stub creation is when we can actually create a decent amount of content from several database sources at once. This bot only uses a single source to create basically useless one sentence stubs. The future opportunity cost of this action is enormous. What if 10 years from now, we have a bot that can digest all of JSTOR and create real non-stub articles? We won't be able to use it for gastropods, because they will all have existing stubs in varying degrees of clean-up and/or development. Imagine if someone had created 100,000 1 sentence U.S. city stubs before Ramman had a chance to automatically generate his more comprehensive articles. I would at least like to see a proposal that is more ambitious that 1 sentence stubs.
Why does this bot use the old style taxobox instead of the new automatic taxobox? The entire reason the automatic taxobox was created was for situations like this, where taxonomy updates will affect hundreds of articles at once. Once again, it will be enormously easier to get this right the first time than having editors hand-update 5,000-15,000 stubs after the fact.
Who's going to watch all of these bot-created articles for vandalism? No one, because no one has any investment in them.

Kaldari (talk) 23:28, 22 August 2011 (UTC)[reply]

(edit conflict) Kaldari, The gastropod project members are better qualified to answer these questions. I will give them a shot.

We need these articles to be created now; not 10 years later. The editors will have existing articles to add images and additional information to. I have heard there are more images on Commons than articles available here on Wikipedia.
I had asked about implementing the automatic taxobox on the project talk page. The project members had decided against it for the time being. See Wikipedia_talk:WikiProject_Gastropods/Archive_4#Automatic_taxobox.
Vandalism is not an issue that the few project members can deal with. I am encouraged by the newer bots that have better algorithms to detect vandalism. This should not be a deterrent for progress. — Ganeshk (talk) 23:45, 22 August 2011 (UTC)[reply]

No, we don't need these articles created now, and having such a short-sighted perspective on Wikipedia content has real opportunity costs in the long-term. If Wikipedia is better 10 years from now because we didn't run this Bot, I don't believe we should run it. Wikipedia is a long-term project, and it's content should be built with the long-term in mind, not just instant gratification. Do you really feel like one sentence stubs are the best we can do? Kaldari (talk) 00:03, 23 August 2011 (UTC)[reply]
The proposal you reference was not in regard to this Bot action, but in regard to the WikiProject adopting automatic taxoboxes in general. Obviously, adding 5,000-15,000 new articles changes the equation. Having that number of articles using the old-style taxobox is going to be completely unmanageable in the future (considering we are still in the infancy of molecular phylogeny). Kaldari (talk) 00:03, 23 August 2011 (UTC)[reply]
Adding thousands of new articles that no one is going to watch is like creating a giant playground for vandals. The beauty of organically grown articles is that you are fostering editor investment in the content of the encyclopedia. This is one of the reasons why Wikipedia actually works even though it seems impossible on paper. Kaldari (talk) 00:03, 23 August 2011 (UTC)[reply]

The idea behind getting these stubs on Wikipedia is that they grow here. This will invite new editors to join. It is done with a long term plan in mind. The bot periodically checks the stubs to make sure they are valid.
The automatic taxoboxes are not a standard yet. They cannot be added without the project approval.
Wikipedia will never have as many editors as the number of articles out here. That did not stop us from getting to 3 million articles.

I see no way of bridging our divide. I respect your opinion. Thanks for your input. — Ganeshk (talk) 00:14, 23 August 2011 (UTC)[reply]

There are plenty of databases already out there that you (or someone else) could use to create fuller articles. For example, you could use the Database of Western Atlantic Marine Mollusca to create new articles for all of the Western Atlantic species that actually incorporates all the information from that database: geographic range, depth, maximum reported size, type locality, and full lists of references. If all of those articles already exist (in varying states), this won't be possible (or at least it will be exceedingly difficult). Why isn't this Bot proposal incorporating other sources, rather than just filling articles with {{expand section}} templates. Surely, you can offer something better than this. Kaldari (talk) 00:23, 23 August 2011 (UTC)[reply]
Then get approval. Otherwise, as someone who will probably have to help change them at some point, I must oppose. Kaldari (talk) 00:23, 23 August 2011 (UTC)[reply]
You don't need as many editors as articles, but you do need to have most articles on editors' watchlists otherwise vandals will realize that Wikipedia is ripe for invading. Kaldari (talk) 00:23, 23 August 2011 (UTC)[reply]

I will look into your request to further expand the stub. I have access to the offline download of the WoRMS database. It is a MySQL database. I am currently limited by what is available on the webservice. I will see if I can combine the webservice and the offline database to add locality and distribution information to the stubs. Will you be willing to accept it then? Thanks. — Ganeshk (talk) 00:30, 23 August 2011 (UTC)[reply]

I would need to see the end result, but I would be much more likely to support if you aren't just creating thousands of identical 1 sentence articles. Kaldari (talk) 01:03, 23 August 2011 (UTC)[reply]

It looks like JoJan has proven my point #3 wrong.[1] Kaldari (talk) 16:47, 23 August 2011 (UTC)[reply]

Obscure species articles are not the biggest targets for vandalism on wikipedia. In the future, could you link to examples of real vandalism to make your point rather than vandalizing an article yourself? Wikipedia has readers, not just editors. --72.208.2.14 (talk) 18:32, 23 August 2011 (UTC)[reply]

I myself with help from another Project member have checked through many thousands of earlier stubs, 2000 old ones created by Polbot plus many thousands created by hand, and I can confidently say that obscure snail stubs do not appear to attract vandalism. I believe I found none in over 9000 stubs. Invertzoo (talk) 21:56, 23 August 2011 (UTC)[reply]

According to User:Ganeshbot/Animalia/History this bot has already created 15,083 stubs. How was this approved without wider community input? Kaldari (talk) 23:35, 22 August 2011 (UTC)[reply]

That is the reason for this request. Please read prior disclosure on the top. — Ganeshk (talk) 23:45, 22 August 2011 (UTC)[reply]

In my opinion species stubs make connections and contributions to editing that are worth having 1000s of one sentence articles about slugs and snails. They provide the taxonomy to readers searching the web; they provide a place for pictures; and they provide drafts that are cumbersome to make for even more experienced editors. If it is possible to find a way to coordinate adding data from additional databases, though, I agree that it would be much better for the articles. Can you look into this Ganeshk? --72.208.2.14 (talk) 12:48, 23 August 2011 (UTC)[reply]

It seems to me that as well as the points that are being openly discussed, there are some unspoken prejudices at work here:

1. It appears to be the case that a few years ago, in a completely different part of Wikipedia, a nasty experience with a large number of bot-created stubs (many of which had to be deleted) has soured some long-time editors against the whole idea of generating numerous bot stubs. But that fear and the subsequent avoidance is not necessarily appropriate here.

2. We have the usual disagreement between the Wikipedia immediatists and the eventualists. This is never easy to reconcile.

3. It is worth remembering however that not so long ago most of Wikipedia consisted of stubs. It was a risk then and it is a risk now, but so far, so good.

4. I for one see every article in Project gastropods as part of my immediate responsibility, I do not feel that way only about the hand-made ones, or only about the ones I wrote myself.

5. I think it is unfair to call the bot stubs "one-sentence articles". The taxobox alone contains an enormous amount of information in a very concentrated form.

If I come across a scientific name of a snail or a slug and I google it, if I get the Wikipedia bot-generated stub, the taxobox tells me right away what kind of snail it is, what family it belongs to, and who named it, a lot of extremely helpful information. If someone adds a correctly-identified image to that stub, I then have a vast amount more information, even though the article might still be "one sentence".

5. For many species of gastropods, the amount that is known is pitifully small, and thus the article may have to be quite short for that reason. However, for quite a lot of snail and slug species, a taxobox and one sentence may be a lot more than is currently available elsewhere on the web!

6. Admittedly I know nothing about bots and coding, but it doesn't sound correct that a bot could not replace a stub with a longer stub if that becomes necessary at some point in the future?

Thanks to everyone for their input on all these matters, it is much appreciated. Invertzoo (talk) 23:04, 23 August 2011 (UTC)[reply]

One more thing:

7. I am concerned that trying to integrate info from other mollusk databases simultaneously with the WoRMS info might be very difficult indeed because gastropod taxonomy and nomenclature has been constantly changing radically in the last 10 years, experts have been disagreeing... Many current databases are using older taxonomies and nomenclature. It's rather crazy. I would think if you try to use more than one database there would usually be a large percentage of entries that won't match up from one database to another, or do match up, but in misleading ways.

Generating a number of much fuller short articles using the Malacolog online database (by itself) sounds like a really promising idea, however the bot would have to be completely rewritten to do this.

Invertzoo (talk) 23:18, 23 August 2011 (UTC)[reply]

Wrapping up

Ok, at this point it looks like the primary opposition to this bot is in regard to the amount of information it is able to produce on these stubs. Ganeshk, how long/difficult do you think it would be to incorporate more database information into this task as has been requested? If it's not going to take too long, I'd like to proceed with trial runs, as I don't think we can get much more useful discussion out of this without seeing what this version of the bot is capable of. Hersfold ^(t/a/c) 23:27, 23 August 2011 (UTC)[reply]

Hersfold, It will take at least 3 weeks to check the feasibility of adding additional information. Can this request be left open until that time? I will get back with a status by September 15. — Ganeshk (talk) 01:22, 24 August 2011 (UTC)[reply]

That should be fine. The issue of bot's updating already created articles has also come up. I've been meaning to ask, how does this bot update its articles? You mention in the function details that it will regularly check the articles against the database, can it only do this if the article format is the same as when it was created, or if nobody else has edited, or what? - Kingpin¹³ (talk) 02:37, 24 August 2011 (UTC)[reply]

The bot does not update existing articles. It picks up the unique identifier (WoRMS ID) from the articles and checks them against the WoRMS database. It can only do this for articles that use the {{WRMS species}} template. The bot will write out the species pages that are invalid. I manually refresh the Wikipedia:WikiProject Gastropods/Unaccepted page with the extract. The actual steps are detailed here. A project member will review the unaccepted page; correct the issue and mark each item as completed.— Ganeshk (talk) 02:47, 24 August 2011 (UTC)[reply]

A few more remarks

We rely on the database World Register of Marine Species because it is maintained by professional taxonomists and it is being kept updated every day. All other databases with respect to Gastropods are usually behind or even far behind in taxonomic accuracy and can't be relied on to be used by a bot, but must be considered manually on a case by case basis. The mixing of several databases by a bot would create a terrible mess.
We have a bot that monitors in real time all edits to any gastropod article (also to bot-generated articles). Any vandalism is quickly dealt with (see on my talk page: "A barnstar for you").
We desperately need all these articles to add information on a daily basis. It keeps us from the tedious chores of having to create new articles all the time.
Changes in taxonomy, as registered in WoRMS, are being picked regularly by another bot and are being dealt with manually. JoJan (talk) 05:27, 24 August 2011 (UTC)[reply]
All in all, gastropod articles are accurate and we do our best to keep them accurate (sometimes with a delay of a few days).

To Bot Approvals Group member: Last request for approval have been highly affected by somebody, who registered to Wikipedia only for purpose to stop this Bot and then he stopped editing. If such situation will occur in newbies or anonymous in this discussion, I would appeal to BAG member to evaluate them much more carefully. Thank you. --Snek01 (talk) 15:07, 24 August 2011 (UTC)[reply]

support. I will describe my personal EXPERIENCE with this: It helps organizing existing and newly added informations on wikipedia; it helps adding infomations about newly described species; it helps make Wikipedia more unbiased. It helps to keep organized more about 50.000 synonyms of gastropods. How many articles needs to be created? There are 28.000 (valid) species of extant Gastropoda on WoRMS from about 30.000 existing species of extant marine gastropods. There are 22,351 articles on wikipedia, 15 000 of them created by Ganeshbot. Ganeshbot is able to newly create only about 6.000-13.000 (theoretical range). (My guess is about 8.000 articles). This request for approval covers about 10% of the scope of Wikiproject Gastropods. All articles that are/will be crated by GaneshBot are/will be updated in its name and synonyms in very easy way, because editors are automatically informed that something newly happened with the actual name of the snail. If an editor will create an article manually, then Wikiproject gastropods members are practically update its status after many years or they will overlook it. There is no need to be afraid about this Bot, because we know that it works very practically, it is practical for expanding its articles and nobody suggested a better way. If anybody will oppose this Bot and does not immediately create the more efficient alternative, then he/she supports to keep ~45% of marine gastropods kept biased. One half of marine gastropods have been successfully created this way and we are satisfied with this. Bot can finish its much smaller part of its task. --Snek01 (talk) 15:07, 24 August 2011 (UTC)[reply]

Support

I also support, see my 1 to 7 numbered comments above. We have so far never had a bad experience of any kind with this bot. If perhaps in a year or two some longer, fuller, bot-generated articles are able to be created (for example using the database Malacolog), can a new bot be set up in such a way that it will be able to replace some of these current short stubs with newly-created longer articles? Will a bot be able to do that only as long as no additional info was added in the interim? I don't know much about that side of things, but it seems likely that could be done? If so, then I don't see a problem for the future, but if someone (User:Kaldari?) knows more about this, please do tell us. Invertzoo (talk) 22:39, 24 August 2011 (UTC)[reply]

The difficulty of automatically adding more information to existing articles, rather than at the point of creation is an order of magnitude difference, and much more prone to error. In order to add information to an existing article, you have to write code to read the existing article, parse it in some meaningful way, and figure out where to insert the new information without disrupting the existing article. And figuring out if that information has already been added to the article by someone manually is virtually impossible for a bot, so sometimes it will end up adding duplicate information. If you are even considering using a bot to add information from other databases, it should be done at the beginning, not after article creation. Kaldari (talk) 17:30, 26 August 2011 (UTC)[reply]

Currently we are not actually considering that possibility at all; I believe that was an idea you raised, your being concerned about the future of all the gastropod articles. However, if we are talking what might happen on Wikipedia in a few years's time, it seems to me that rather than trying to add information into preexisting stubs, perhaps a bot could be built that could recognize any old (still untouched) bot-generated stubs, and simply replace them with fuller better stubs, that is of course assuming that an appropriate public domain database could be found that we could use! But basically our idea right now is that humans will do all of the tricky work that is too complicated for a bot to figure out, whereas the bot would be fabulously useful in doing the tedious repetitive work of creating new stubs which is a waste of time for human editors to do. You see, currently it has been happening that new editors who have numerous very good gastropod images to add are not adding their images because starting a large number of new species articles is too much work. However, an almost complete range of preexisting stubs would make it very simple for editors to add images or any other info. This simple expansion process is already happening in the existing bot-generated stubs. Invertzoo (talk) 18:11, 4 September 2011 (UTC)[reply]

A couple of people have objected to the idea of incorporating info from other databases into the work of this bot due to concerns that other databases do not have up-to-date taxonomy. I really don't see this as being an issue, at least for species articles. We should rely on the World Register of Marine Species for 100% of the taxonomy information but also take morphological and ecological information from other databases to fill out the species articles. There is no reason to try to merge competing taxonomic info from more than one database if we know that the World Register of Marine Species is the most up to date. We can, however, match up species information with a high degree of accuracy by just cross-referencing species names from the register in other databases. Kaldari (talk) 17:30, 26 August 2011 (UTC)[reply]

Kaldari, The terms and conditions for use at Malacalog.org forbids us from reproducing their content. Please review [2]. HTML scraping is usually illegal. WoRMS provides a formal way to access their database using webservice and offline download. And they allow the content to be reused under CC-BY-SA 3.0. Please see citations at [3]. The other databases do not have these. If you are familiar with other sources, let me know. — Ganeshk (talk) 22:03, 26 August 2011 (UTC)[reply]

Ganeshk is right. As far as I know, WoRMS is the only reliable database under a CC-BY-SA 3.0 license. Content to the articles in our project is being added manually on a daily basis using, in one's own words, content from handbooks, scientific journals,etc. + images in the public domain, and text in the public domain, especially from the Manual of Conchology (1879-1898) by George Washington Tryon. This is an enormous work, but the workload is made a lot easier by this bot. JoJan (talk) 12:03, 28 August 2011 (UTC)[reply]

Also, even assuming that relevant public domain resources were available, trying to add info from other databases would force the editor to make this or her own interpretations about synonyms, and other nomenclatural decisions. It is not nearly as easy as it sounds. Gastropods are not nearly as well studied as jumping spiders, and experts very frequently disagree about species. Invertzoo (talk) 18:11, 4 September 2011 (UTC)[reply]

Support I think that Invertzoo raises the most important points about the benefits of this bot, weighing its benefits from the perspective of Wikipedia being an encyclopedia and providing information to its readers.

"The taxobox alone contains an enormous amount of information in a very concentrated form. If I come across a scientific name of a snail or a slug and I google it, if I get the Wikipedia bot-generated stub, the taxobox tells me right away what kind of snail it is, what family it belongs to, and who named it... (And I can click on the family in the taxobox and find out more information about the snail in general from that article.) If someone adds a correctly-identified image to that stub, I then have a vast amount more information, even though the article might still be "one sentence". For many species of gastropods, the amount that is known is pitifully small, and thus the article may have to be quite short for that reason. However, for quite a lot of snail and slug species, a taxobox and one sentence may be a lot more than is currently available elsewhere on the web!"

This is the goal of an encyclopedia, and of wikipedia, to provide information. If the best we can do is provide a one sentence stub and the taxonomy for 50,000 snails, we've enhanced the utility of the web by those 50,000 taxoboxes. It would take a decade for human editors to increase the utility of wikipedia by that much. Wikipedia and the web benefit by this bot. --72.208.2.14 (talk) 19:19, 7 September 2011 (UTC) (the persecuted)[reply]

Update

I have completed my research into adding additional data elements to the stub. I was able to harvest the distribution information from the offline download. Here are some sample stubs:

Alvania aartseni
Lobatus costatus (distribution information from synonyms)
Acochlidium fijiense
Diaphana minuta (notes)

I did not see anything else that can be downloaded. Please review this data structure on Internet Explorer (Set zoom on the bottom right to 400%) and let me know if any other information will be useful. Only a subset of tables from the data structure are available in the offline download. Here is a list of what is available: actions, context, dr, dr_context, fossil, gu, languages, notes, notes_context, notes_sources, ranks, sessions, sources, sources_context, sourceuses, status, tu, tu_context, tu_sessions, tu_sources, vernaculars, vernaculars_context, vernaculars_sources. — Ganeshk (talk) 11:17, 16 September 2011 (UTC)[reply]

Possible data elements:

Type locality (typelocality_flag in the dr table): We can add a line in the distribution section, "The type locality for this species is X."
Habitat (brackish, fresh, terrestrial, fossil flags on tu table): Are these fields useful? If yes, provide a section and the sentence.
Notes (notes table): Can we add notes if available? The text is available as CC 3.0. Random example, http://www.marinespecies.org/aphia.php?p=taxdetails&id=205853.

— Ganeshk (talk) 11:59, 16 September 2011 (UTC)[reply]

You can add notes as in the random example, but not detailed descriptions of the species (as they are sometimes given, based on recent publications). JoJan (talk) 13:19, 19 September 2011 (UTC)[reply]

See notes section below. — Ganeshk (talk) 03:07, 20 September 2011 (UTC)[reply]

I have a couple of other remarks :

1) Taxobox : species : this should be L. lobatus instead of Lobatus, costatus (as seen in the sample stub)

2) When mentioning synonyms in the taxobox : use this time a bulleted list instead of < /br> . This makes it much easier to read and check in the edit mode. JoJan (talk) 18:08, 16 September 2011 (UTC)[reply]

Done. I have addressed the two issues. See Lobatus costatus. — Ganeshk (talk) 00:34, 17 September 2011 (UTC)[reply]

Very cool. A couple minor issues:

The synonym list for Lobatus costatus didn't seem to work.
There should be no space between 'locations:' and '^[1]'
I wonder if a prose location list would look better than a bullet list. Thoughts?

Kaldari (talk) 18:19, 16 September 2011 (UTC)[reply]

I have fixed the synonym issue
I have removed the space after the locations:.
I will wait to hear what's decided on the location list. — Ganeshk (talk) 00:34, 17 September 2011 (UTC)[reply]

A prose location list would indeed look better. But I wonder if a bot could perform this. When expanding an article, I usually give the distribution of a species as this "This species occurs in the Gulf of Mexico, the Caribbean Sea and the Lesser Antilles" without going into a detailed list. Such a list is never all encompassing (and can result in a discussion about the exact location of the original find) and it is better, in my opinion, to give the above broader wording (except when the species is endemic). JoJan (talk) 18:47, 16 September 2011 (UTC)[reply]

Hmm, perhaps it should only create a location list if the list is short. Like if there are 5 locations or fewer. That would avoid creating any overly-detailed lists I would imagine. If a species has more than 5 locations, chances are some of them should be generalized or grouped together manually. Also, I like the idea of using the habitat flags. Could you create a demo for that? Kaldari (talk) 22:45, 16 September 2011 (UTC)[reply]

I need someone to tell me how the habitat flags can be used. — Ganeshk (talk) 01:03, 17 September 2011 (UTC)[reply]

How about adding "This species is found in freshwater and terrestrial habitats." onto the end of the lead section? I would probably ignore the "fossil" flag for now, and translate "fresh" to "freshwater". Kaldari (talk) 03:06, 18 September 2011 (UTC)[reply]

I have added a habitat section to the article. See Acochlidium fijiense. — Ganeshk (talk) 03:07, 19 September 2011 (UTC)[reply]

I have added the TaxonIds template to the external links section. Let me know if that is okay. — Ganeshk (talk) 00:35, 17 September 2011 (UTC)[reply]

That looks great. Kaldari (talk) 03:07, 18 September 2011 (UTC)[reply]

This looks horrible = this is controversial and unneeded. See bellow. --Snek01 (talk) 22:35, 21 September 2011 (UTC)[reply]

One more thing. In html, a tag such as <ref name="WoRMS"/> should be written as <ref name="WoRMS" /> . In other words : with an additional space before the slash. See also : [4]. JoJan (talk) 17:21, 18 September 2011 (UTC)[reply]

I have added the extra space. See Lobatus costatus — Ganeshk (talk) 03:00, 19 September 2011 (UTC)[reply]

I have split the distribution section. See Lobatus costatus. Does that address the problem if the locations exceed 10? — Ganeshk (talk) 03:47, 19 September 2011 (UTC)[reply]

Well Ganeshk, you're doing a fine job. Whenever there is a long list, a table is always to be preferred. The bot, as it stands now, seems OK to me. JoJan (talk) 13:15, 19 September 2011 (UTC)[reply]

Well Ganeshk, you're to do a fine job. But you can not expect, that you will apply request for approval for one bot, then receive some support, and then you will apply additional CONTROVERSIAL things. That seems for me highly non-ethic behavior. That makes me a bit nervous. I will cite from Wikipedia:Bot policy: "larger changes should not be implemented without some discussion. Completely new tasks usually require a separate approval request." For example ALL controversial things that were or will be mentioned on the Wikipedia talk:WikiProject Gastropods can not be automatically implemented by this bot. I want to have this bot to be approved, but I have to formally oppose to application all controversial things, that were added after the Request for approval (because I do not know, when the bot approval will be closed). --Snek01 (talk) 21:25, 21 September 2011 (UTC)[reply]

You don't mention it, but I can infer that you are not happy about the use of {{TaxonIds}} template in the External links section. Aren't we still discussing what should be on the stub? I have been trying to add more sections like Description, Habitat etc based on input from Kaldari and others. I have added a note above saying "I have added the TaxonIds template to the external links section. Is that okay?". That was call for discussion about it. BTW, there is no controversy on the use of TaxonIds template. It was created based on wide consensus. You were the only supporter on the deletion discussion. I have added that in the external links section since that is an important element of the stub. It will be useful for harvesting the WoRMS ID from the articles. It would be nice if there were no personal attacks. Thanks. — Ganeshk (talk) 22:34, 21 September 2011 (UTC)[reply]

Kaldari, because you are an author of Wikipedia:Persondata, I would like to ask you for help/advice/consultation about the issue, that unintentionally appeared during the time of this Bot approval. Discussion is at Wikipedia_talk:WikiProject_Gastropods#WRMS_species. Thank you. --Snek01 (talk) 21:25, 21 September 2011 (UTC)[reply]

{{BAGAssistanceNeeded}}

Hmm. Why is BAG assistance needed now? — Ganeshk (talk) 22:35, 21 September 2011 (UTC)[reply]

I deactivated the tag for now. Feel free to re-activate it when BAG is needed. --slakr^\ talk / 04:22, 5 October 2011 (UTC)[reply]

I have started a discussion about using Automatic taxoboxes on the project page. — Ganeshk (talk) 03:07, 20 September 2011 (UTC)[reply]

The geogaphy is going to be a problem. The bot should not list both the Lesser Antilles and Aruba, Bonaire, and Curaçao for example. It should just list the Lesser Antilles if it is found throughout, or specify the Leeward Antilles, if that is the case, or just specific islands in the Leeward Antilles or the Lesser Antilles, whatever is correct. Maybe geographical listings are not within the capabilities of this bot, without an algorithm to check inclusions/exclusions. --68.105.141.221 (talk) 20:37, 21 September 2011 (UTC)[reply]

I assume ERMS means European Register of Marine Species, but the acronym and the ERMS are not wikipedia articles. Acronyms that refer to something not in the article should be wikilinked to the correct article. The bot should not be posting the unknowable. --68.105.141.221 (talk) 20:41, 21 September 2011 (UTC)[reply]

IP, thanks for your input. The bot is using the data available on the WoRMS website. It can either display it or not display it based on what is decided here. — Ganeshk (talk) 22:39, 21 September 2011 (UTC)[reply]

Well, it looks like the consensus at WikiProject Gastropods is to not use Automatic taxoboxes. This is somewhat disappointing, but acceptable. Thanks for giving it another try Ganeshk! Kaldari (talk) 22:08, 21 September 2011 (UTC)[reply]

Notes

Here are the possible note types:

Additional information: 574019
Authority: 138404
Biology: 141111
Classification: 567325
Date of publication: 153944
Depth range: 342124
Description: 211091
Diagnosis: 382350
Diet: 160071
Dimensions: 139557
distribution: 160193
Editor's comment: 564799
Etymology: 557262
Fossil range: 564306
Habitat: 450008
Holotype: 153073
Importance: 157025
Length: 381295
Nomenclature: 465711
Original Combination: 413910
Predators: 157025
Publication date: 208000
Remark: 138255
Reproduction: 159871
Specimen: 389447
Spelling: 138054
Status: 342381
Stratigraphy: 407788
Subgenus: 519682
synonymy: 211097
Taxonomic Remark: 467215
Taxonomic status: 534232
Taxonomy: 137882
Type: 153084
Type locality: 527208
Type material: 388313
Type species: 410291
Validity: 574019

Let me know which ones are needed. Thanks. — Ganeshk (talk) 23:50, 19 September 2011 (UTC)[reply]

"Description" may look problematic as the text may rely on recently published copyrighted material. JoJan (talk) 04:41, 20 September 2011 (UTC)[reply]

Does this mean that we exclude description and use the rest? — Ganeshk (talk) 05:21, 20 September 2011 (UTC)[reply]

In my opinion, the more that can be added by a bot to the article the better (excluding "description"). It saves us a lot of work afterwards. I also wonder if the bot could add the "sources" as in [5]. JoJan (talk) 14:00, 20 September 2011 (UTC)[reply]

Discussion about "sources"

Yes, the bot can pick up all the sources. It is currently built only to pick up original description if available. Do you want to pick the rest of the sources as well? Can you give me an example on how it can be placed in the article? — Ganeshk (talk) 14:58, 24 September 2011 (UTC)[reply]

In the following example of Bathyancistrolepis trochoideus (Dall, 1907) we have different kind of sources:

basis of record: Fraussen K. (2010). Buccinidae checklist. Pers. Com. [details] : A personal communication should not be allowed in wikipedia
from synonym: Hasegawa K. & Okutani T. (2011) A review of bathyal shell-bearing gastropods in Sagami Bay. Memoirs of the National Sciences Museum, Tokyo 47: 97-144. [15 April 2011] [details] [view taxon] : remove [details] [view taxon]
from synonym: Hasegawa K. (2009) Upper bathyal gastropods of the Pacific coast of northern Honshu, Japan, chiefly collected by R/V Wakataka-maru. In: T. Fujita (ed.), Deep-sea fauna and pollutants off Pacific coast of northern Japan. National Museum of Nature and Science Monographs 39: 225-383. [details] [view taxon] : idem, remove [details] [view taxon]

Another example : Clinopegma magnum unicum (Pilsbry, 1905). The sources are :

basis of record: Hasegawa K. (2009) Upper bathyal gastropods of the Pacific coast of northern Honshu, Japan, chiefly collected by R/V Wakataka-maru. In: T. Fujita (ed.), Deep-sea fauna and pollutants off Pacific coast of northern Japan. National Museum of Nature and Science Monographs 39: 225-383. [details] : remove : "details"
from synonym: Fraussen K. (2010). Buccinidae checklist. Pers. Com. [details] [view taxon] : a personal communication is not allowed.

A third example : Beringius Dall, 1887 : The sources are

basis of record: Gofas, S.; Le Renard, J.; Bouchet, P. (2001). Mollusca, in: Costello, M.J. et al. (Ed.) (2001). European register of marine species: a check-list of the marine species in Europe and a bibliography of guides to their identification. Collection Patrimoines Naturels, 50: pp. 180-213 (look up in IMIS) [details] : remove : (look up in IMIS) [details]
additional source: Vaught, K.C. (1989). A classification of the living Mollusca. American Malacologists: Melbourne, FL (USA). ISBN 0-915826-22-4. XII, 195 pp. (look up in IMIS) [details] : remove (look up in IMIS) [details]

Can you show me an example on how these can be used on a Wikipedia article. Please feel free to update User:Ganeshk/sandbox/Beringius behringi. — Ganeshk (talk) 22:41, 26 September 2011 (UTC)[reply]

I've added the sources without hyperlink in "references", those with a hyperlink in "external links". I've omitted the personal communication of Fraussen. In the first source, I've omitted "details", in the second source : "Look up in IMIS " and "details" while taking care that "page(s): 93" follows the previous text and is no longer on a following line. In external links, I've made the normal wiki hyperlink to internet with the url first, followed (after a space) by text such as "ITIS database". Personally, I also put the references in chronological order and don't mention "basis of record" or "additional source" or "from synonym". This gives an idea how I have been working all along. Additional ideas are welcome. JoJan (talk) 05:14, 27 September 2011 (UTC)[reply]

One more remark : after the addition of so many data, the article is no longer a stub, but can be considered as class = start in the template on the discussion page. In other words, the way the bot will operate, the number of stubs won"t rise too much. This will take away criticism that the bot only makes new stubs.JoJan (talk) 05:19, 27 September 2011 (UTC)[reply]

I will looking into adding these. I am also working on scraping the ITIS and EOL links from the webpage. I will get back to you in a few days. Thanks. — Ganeshk (talk) 12:04, 28 September 2011 (UTC)[reply]

Done. See Links and sources section below. — Ganeshk (talk) 11:13, 3 October 2011 (UTC)[reply]

Discussion about "other notes"

It looks like a lot of these note fields can contain extensive prose and thus might not be safe to import for copyright reasons. Kaldari (talk) 22:26, 21 September 2011 (UTC)[reply]

The entire page is available under CC 3.0. Does it not cover all the notes that are posted? — Ganeshk (talk) 22:40, 21 September 2011 (UTC)[reply]

Yes, the whole content on WoRMS is under CC-BY-SA-3.0 license. Could anybody provide at least one example, where is used copyrighted text? If not, the text can be used. - More likely there can appear situation, that the text is incorrect. WoRMS is so large database that there ARE errors. For example I have seen, that the same text is used on larger number of species that appear to not be correct even for the family. Despite that, I hope that all texts can be used by the Bot. Bot operator will announce suspicious examples to the Wikiproject and Wikiproject memebers hopefully know, that there may be errors in every possible resource. --Snek01 (talk) 22:57, 21 September 2011 (UTC)[reply]

I have updated the bot to include the notes. See Diaphana minuta. — Ganeshk (talk) 15:24, 24 September 2011 (UTC)[reply]

Looks good to me. Can you make sure that headers are only created if you actually have data for a particular section? I see that the Habitat section in your example above is completely empty. Kaldari (talk) 04:22, 25 September 2011 (UTC)[reply]

I have removed the empty sections now. It now adds additional spaces in between. This is a limitation with parser functions; they trim out spaces and paragraph marks. I would really like to add spaces and paragraph marks based on a section flag. See Acochlidium fijiense. Are the additional empty lines acceptable? Also, can you strike out the Strong objection !vote above? — Ganeshk (talk) 12:37, 25 September 2011 (UTC)[reply]

Links and sources

I have added programming to pull the links and the sources. See Beringius behringi and Diaphana minuta. Please let me know if there are any other modifications required. — Ganeshk (talk) 02:14, 3 October 2011 (UTC)[reply]

Thanks Ganeshk, I think you've achieved our goal. This bot creates real articles (class = start) and saves us a lot of work. No one can pretend now that the bot will create thousands of stubs. Thanks to this bot we can finally start with the real work of adding content from scientific journals and other reliable sources wherever possible. The only criticism left is something the bot cannot help : typos in the original text in WoRMS and double entries in WoRMSJoJan (talk) 17:41, 3 October 2011 (UTC)[reply]

Thanks JoJan. — Ganeshk (talk) 00:12, 4 October 2011 (UTC)[reply]

Genera

One more question. Can this bot, in a later stage, in the same way (with sources, notes etc.) also create articles for genera with a list of accepted names and a list of synonyms, alternate representations, nomine dubia and species inquerenda ? JoJan (talk) 17:41, 3 October 2011 (UTC)[reply]

Genera are within the scope of this request. I will work on creating some sample pages and post back here in a week's time. — Ganeshk (talk) 00:12, 4 October 2011 (UTC)[reply]

Done. I have created a sample page for the genus, Conus. Please check and let me know what you think. On a related note, why don't we create subspecies articles here? For example, the subspecies, Conus navarroi calhetae is accepted over the Conus calhetae. — Ganeshk (talk) 10:20, 6 October 2011 (UTC)[reply]

By chosing Conus, you just chose one of the most complex genera. This gave me the opportunity to make some changes to the article in order to make it more palatable and more pleasing to the eye. I wonder if your bot could perform this. I've nothing against separate articles about subspecies as long as these subspecies are also mentioned under a heading "subspecies" in the genus article. Anyway, there is usually not much known about subspecies and those articles are likely to remain stubs. JoJan (talk) 13:05, 6 October 2011 (UTC)[reply]

Done I have updated the program to split the species list as per your request. Please check the Conus page. Thanks. — Ganeshk (talk) 13:30, 16 October 2011 (UTC)[reply]

I have created Turbonilla as well. I have skipped loading Sub genus and Sub species at this time. Is that okay? — Ganeshk (talk) 13:42, 16 October 2011 (UTC)[reply]

This looks OK except that I can't render the columns in the section "accepted species" in Conus and in Turbonilla (I'm using Firefox 7.01). However, I can see these columns in the preview mode, but no longer when I save the page. Strange, because it works in the other sections, such as "synonyms" or "nomina nuda". I tried a different arrangement in the text of the columns template but it doesn't help. JoJan (talk) 18:22, 16 October 2011 (UTC)[reply]

It is displaying fine for me on Chrome (14.0) and FireFox (3.6.6). I don't have much control over it anyway. — Ganeshk (talk) 22:35, 16 October 2011 (UTC)[reply]

Family

I have completed the programming to download all the genus/species pages for a selected family. I have tried it with Pyramidellidae [6]. Here is a log of trial articles from Aartsenia to Chemnitzia:

This brings up a question, what do we do when the genus is unaccepted and the species are accepted? Like in Chemnitzia[7]. The other issue is Charilda is accepted, but its species is not.[8] — Ganeshk (talk) 01:09, 17 October 2011 (UTC)[reply]

I've checked a number of these try-outs and they look OK to me. They render the content of WoRMS in an exact manner. None of these articles is the final touch, but, in most cases, just a beginning and an invitation to add more.

The genus Chemnitzia poses indeed a problem and, in this, it is not alone. I've encountered frequently such genera where the genus name has become a synonym, while all, or a number of its species, are still recorded under the genus name. I suppose this arises because there hasn't been any research in this direction. This has to be dealt with on a one to one basis. Such as, Chemnitzia abbotti was, as recently as 15 May 2011, dealt with in WoRMS by dr. Bouchet and is considered by him an accepted name. OBIS , on the other hand, considers it a synonym and gives as accepted name : Turbonilla (Chemnitzia) abbotti (see : [9]). We can only state this in our article. Doing otherwise would be original research. But it would be handy if you could give a list of such cases each time this happens.

Charilda represents another problem (and again, it is not the only genus with this problem): all the names of the species have become synonyms, while the name of the genus is still accepted (probably a nomen conservandum). Again, in such case, follow WoRMS and make an article under the genus name with the species mentioned as synonyms, as you have done in this particular case. As you can see, taxonomy is full of surprises. JoJan (talk) 14:33, 17 October 2011 (UTC)[reply]

I have updated the program to flag the species where the genus is invalid. — Ganeshk (talk) 03:36, 18 October 2011 (UTC)[reply]

Closing

To BAG: Thanks for letting this page be open this long. I am done with the programming and ready for a trial if that is okay with you. Please approve. — Ganeshk (talk) 03:35, 18 October 2011 (UTC)[reply]

Ok, wow, that was a lot of reading. While I do see some objections, it looks like you have enough support from WikiProject Gastropods to warrant a trial Approved for trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete. Report back when you an accurate sample of the bots activities. --Chris 06:46, 23 October 2011 (UTC)[reply]

Thanks Chris. I have asked the Gastropods project to approve the lead sentence and taxonomy. — Ganeshk (talk) 15:39, 23 October 2011 (UTC)[reply]

Trial complete. I have created 50 articles. Please review. — Ganeshk (talk) 04:10, 26 October 2011 (UTC)[reply]

A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with {{t|BAG assistance needed}}.

A project member reviewed each article and confirmed they are OK. Please approve this task. — Ganeshk (talk) 22:51, 26 October 2011 (UTC)[reply]

Approved. --Chris 08:18, 31 October 2011 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.

^ Cite error: The named reference WoRMS was invoked but never defined (see the help page).

[WoRMS-1] Cite error: The named reference WoRMS was invoked but never defined (see the help page).

[1]