Wikipedia talk:Automated taxobox system


This talk page can be used to discuss issues with the automated taxobox system that are common to the entire system, not just one of its templates. Discussions of this nature prior to 2017 can be found at Template talk:Automatic taxobox

Those familiar with the system prior to mid-2016 are advised to read Notes for "old hands".

30 June 2024 use stats update

edit

30 June update

Project Auto Manual Total taxa Percentage auto # auto added since 30 December 2023 # manual subtracted
Algae 2280 160 2440 93.4 117 67
Amphibians and Reptiles 22711 199 22910 99.1 187 7
Animals 11596 915 12511 92.7 429 243
Arthropods 11355 2719 14074 80.7 581 348
Beetles 26514 11994 38508 68.9 1783 1427
Birds 14405 48 14453 99.7 47 14
Bivalves 1696 28 1724 98.4 22 4
Cephalopods 2020 558 2578 78.4 11 8
Dinosaurs 1624 0 1624 100 -19 0
Diptera 15081 1565 16646 90.6 921 600
Extinction 796 31 827 96.3 NA NA
Fishes 25302 960 26262 96.3 894 711
Fungi 12194 3932 16126 75.6 1539 1239
Gastropods 32419 2972 35391 91.6 4909 4252
Insects 61302 18450 79752 76.9 3324 2269
Lepidoptera 83659 14801 98460 85.0 9028 8965
Mammals 8401 124 8525 98.5 100 20
Marine life 8990 527 9517 94.4 267 145
Microbiology 7675 5393 13068 58.7 704 637
Palaeontology 15506 3198 18704 82.9 727 276
Plants 81558 188 81746 99.8 1638 423
Primates 983 0 983 100 4 0
Protista 778 150 928 83.8 398 -70
Rodents 3161 25 3186 99.2 24 3
Sharks 833 38 871 95.6 4 7
Spiders 10110 0 10110 100 70 0
Tree of Life 100 0 100 100 11 6
Turtles 760 0 760 100 1 0
Viruses 1736 55 1791 96.9 14 0
Total 407991 57001 464992 87.7 24103 19707

Mammal subprojects with articles tagged for both mammals and subproject:

Project Auto Manual Total taxa Percentage auto
Cats 185 0 185 100
Cetaceans 445 0 445 100
Dogs 241 0 241 100
Equine 109 0 109 100
Methods and caveats (copy-pasted from previous update)

Method: For the most part I use Petscan to search for articles with a talk page banner for a particular Wikiproject and either {{Taxobox}}, or any of {{Automatic taxobox}}+{{Speciesbox}}+({{Infraspeciesbox}} and/or {{Subspeciesbox}} (depending on whether botanical/zoological code is relevant)), and record the results. Example search for algae with automatic taxoboxes (search terms are in the Templates&Links tab in Petscan). For viruses, I search for {{Virusbox}} rather than the other automatic taxobox templates. For plants, I sum the results for the Plants, Banksia, Carnivorous plants and Hypericaceae projects. "Total" is derived from the Template Transclusion Count tool (https://templatecount.toolforge.org/index.php?lang=en&namespace=10&name=Speciesbox#bottom e.g. results for Speciesbox), and is not actually sum of the results for individual projects (some articles have talk page banners for multiple Wikiprojects, and would be counted twice if rows were summed). I started compiling these stats in April 2017, and have been updating roughly every six months since December 2017. I've kept my method consistent; perhaps I should have included all of the automatic taxobox templates (Hybridbox, Ichnobox, etc.), but I didn't do so at the beginning, and the other templates aren't used in very many articles.

Caveat: The remaining manual taxoboxes in projects with a high percentage of automatic taxoboxes mostly have some kind of "problem". I have periodically reviewed all the manual taxobox articles in projects with less than 207 manual taxoboxes, and chose not to convert them to automatic taxoboxes at that time (however, it has been awhile since my last review, so there probably a few recently included articles I haven't reviewed). "Problems" may include:

  • Fossil taxa; fossil classifications may be derived from multiple sources and present classification on Wikipedia may include mutually incompatible hypotheses. Fossil taxa are often not be linked from extant parent taxa.
  • Synonymy; there is some obvious synonymy issue; e.g., a species is in a genus which redirects (as a synonym) to another genus; maybe the species article needs to be moved or maybe the genus should be reinstated
  • Common names; articles with common name titles may not correspond to taxa, but still have manual taxoboxes. In some cases {{Paraphyletic group}} may be appropriate, in others the taxobox should be removed
  • Parasite and pathogens; article on parasites and pathogens may be tagged for the WikiProject of the organisms they infect. Higher level taxonomy templates for the parasites may not yet exist, and the classification presented in manual taxoboxes may not be up to date.

I've added WikiProject Extinction to the table this time. WikiProject Protista continues to have tags added to existing articles, with a net increase in the number of tagged articles with a manual taxobox. WikiProject Dinosaurs recently merged a bunch of largely redundant articles for nodes in a cladogram, resulting in a net decrease in the number of articles tagged for that project. Plantdrew (talk) 17:12, 30 June 2024 (UTC)Reply

Thanks for doing these updates. Good to see progress. Did you include {{WikiProject Cacti}} in with the Plants totals? It doesn't look like that template automatically adds it to the parent WP like the other plants subprojects. awkwafaba (📥) 19:01, 1 July 2024 (UTC)Reply
@Awkwafaba:, I did not include WikiProject Cacti in the totals. However, for the past several years, I've been running the "Taxon pages not tagged in WP ToL clade projects" query on your user page to ensure all taxobox articles are tagged for a project immediately before I start compiling an update of these numbers (and in general I run your query every couple of weeks, but haven't made it a priority to tag redirects). I did pick up several cacti articles and added WikiProject Plants tags before I started this update. None of the plant subprojects get picked up in a Petscan search for {{WikiProject Plants}}, so I have always done a separate search for Banksia/Carnivorous plants/Hypericaceae and added those results to the results for Plants when presenting these numbers (the other 3 subprojects aside from Cacti do contribute to the numbers reported in the assessment table for WikiProject Plants). Plantdrew (talk) 20:07, 1 July 2024 (UTC)Reply

Automatic child taxa?

edit

I don't really know how else to title this. I'm one of the editors on a wiki which focuses on recording fictional species made for a large collaborative speculative evolution project, and at some point for much the same reason you all did, we came up with an automated taxonomy system to reduce the pain of updating taxonomy for hundreds, even thousands of species. However, ours works a bit different from Wikipedia's, storing all taxonomy data in a centralized place--a JSON file. As all the data is in one place, it also allowed us to also be able to easily reverse the direction and display, for instance, all descendant taxa as well.

Looking at how Wikipedia does taxonomy, I noticed that there are places where it would make sense to automatically generate a list of descendant taxa. Most notably, the subdivision section of the automatic taxobox, and perhaps various other lists of genera and species around the wiki. I can't imagine pages like the list of Asteraceae genera being anything short of a nightmare to update and maintain, assuming its reputation among botanists is earned, and I could see it being worse for decently large mid-level taxa that are in a state of flux due to several new studies being published.

I think that the current system Wikipedia is using might make generating lists of child taxa impractical, but on the other hand, I wonder if the changes needed to support it would actually be considered worthwhile to those involved in this wikiproject. I know that for the aforementioned wiki I'm part of, this also made it much easier to browse taxonomy in general because readers and editors alike could reliably access related and descendant taxa from anywhere. And while editing is moderated on our wiki so we haven't had need for this, I can't help but imagine it would make it a bit easier to spot and fix vandalism as well because it would be plainly visible from higher taxa (which one might be more likely to view in some cases). Any thoughts on the idea? Disgustedorite (talk) 21:24, 20 July 2024 (UTC)Reply

@Disgustedorite:, when automatic taxoboxes were first being developed (ca. 2011), there was an attempt to include automatic child taxa that was eventually abandoned. I don't know the details about why it didn't work.
There is a script (User:Jts1882/taxonomybrowser.js) that allows you to see the taxonomy in a tree view, with children.
However, not all articles are using automatic taxoboxes (~88% are using them, but that still leaves 50,000+ articles with manual taxoboxes). And Wikipedia doesn't have articles for every genus, let alone every species. Plantdrew (talk) 22:44, 20 July 2024 (UTC)Reply
On our wiki we actually manually update some lower taxa on a species by species basis while the higher taxa are what is automated, since for a collaborative speculative evolution project with nearly 10 times as many species as there are dinosaurs and upwards of 200 more added every year, the frequency at which those are defined and updated can make dinosaur researchers jealous.
I just skimmed the source code of the taxonomy browser and...well, I suppose the processing impact doesn't matter that much when it's run on your own machine, lol. I will say using the search API and taking advantage of taxonomy being stored exclusively within the template namespace is pretty smart. Our strategy was to index child taxa and then search that index taxon by taxon, though having far fewer species than have been described in real life (and not actually having a page for every member of a genus of insects) gives us the advantage of not needing to actually maintain an index by hand (we have few enough taxa that it's economical to index it over again every time).
If I were to take a guess, I can see that Wikipedia has no extensions like Semantic MediaWiki (understandable given its current state), Cargo (I wouldn't use it either), or even DynamicPageList3 (performance hell), which leaves the search API and checking each and every result as basically the only option, which, even if it was possible in a module, I could imagine hitting memory limits fast. On our wiki, we're looking into making a sort of poor man's Semantic MediaWiki using an autonomous bot that records and indexes information about pages in various dedicated JSON files...but a bot-dependent system wouldn't fly here, right?
Although, Wikipedia does have CategoryTree, which I think is what our wiki attempted to use for taxonomy browsing...in 2007, when there were only a few hundred species. But in any case, using the various different parameters of its parser function, it might be possible to twist it into something like a poor man's DPL3 by having each taxonomy template page automatically be added to a category like "child taxon of Parentsnameidae" and then using several instances of the CategoryTree parser function (or just one that's been quite heavily altered by the lua script after it was generated, if that's possible) to display it. But that's also a lot of potential categories... Disgustedorite (talk) 00:33, 21 July 2024 (UTC)Reply
While I agree it would be nice to have automatic child taxa, it really isn't practical for Wikipedia. Wikipedia has to be open for everyone to edit, which is why we have the template system over centralised JSON or Lua module methods, and NPOV means we have to be able to show alternative taxonomies over one agreed system.
The taxonomy browser was developed as a tool to manage the taxonomy templates. It picks up the parent-child relationship of templates, which most of the time is a taxonomic relationship but gets more confusing where there are alternative taxonomies. And any JS additions have to be opt-in. For a Wiki that could impose one taxonomy with a centralised JSON source, a JS addition to the taxobox would be possible.  —  Jts1882 | talk  10:37, 21 July 2024 (UTC)Reply