Wikipedia talk:Typo Team/moss/Archive 1
Early Cornish texts
editI've added {{lang}} tags to Early Cornish texts so that the Cornish text is explicitly marked. Will that be enough to remove it from the next set of "moss" results? If not, I suggest it go on an explicit list of articles to be skipped, since no one will be adding 2000+ Cornish words to Wiktionary anytime soon. -- John of Reading (talk) 16:52, 8 January 2015 (UTC)
- @John of Reading: Oh yes, that's perfect! That will certainly get excluded in the next update. And that actually leads to an excellent TODO item, which is to make sure that words tagged with a particular language are words in that language and not just in any language. Then we could actually restrict regular prose to English words, which would identify any text that needed {{lang}}. This will significantly increase recall of misspellings, plus make the project much more accessible for text-to-speech users. Hurray! 8) -- Beland (talk) 18:00, 8 January 2015 (UTC)
Bug in the word-splitting algorithm
edit1772 in poetry is listed here as containing the word "ation". In fact, the wikitext reads [[wikt:attest|attest]]ation
, and the "ation" should be taken as part of the preceding word. Tricky! -- John of Reading (talk) 13:51, 12 January 2015 (UTC)
- Looks like that's still a bug. I put it on my fix list. -- Beland (talk) 08:53, 14 April 2018 (UTC)
- FTR, I just confirmed that this has since been fixed. -- Beland (talk) 06:37, 3 June 2022 (UTC)
Possible misuse of "Not a typo"
editThe instructions here seem to recommend that if no other fix applies, then the article should be edited to add either {{Proper name}} or {{Not a typo}} around the relevant word to take it off the "moss" report.
I think that this would be seen as disruptive if done on a large scale. I add these templates only when there is a genuine risk that editors will accidentally damage the text, or when text has already been damaged by incorrect edits. See the history of Holy anointing oil for an example.
In particular, large scale addition of {{Not a typo}} using AutoWikiBrowser would be contrary to the fourth of its rules of use.
This does mean that entries won't be cleared from the list when the report is re-run. My pragmatic workaround for that is that when the page is re-created from a new database dump, the list should start at 1880 in organized crime where the current list ends. We'll then have hundreds of new articles to check. -- John of Reading (talk) 22:19, 12 January 2015 (UTC)
- @John of Reading: Hmm, well I am definitely going to rotate the starting letter of various todo lists, because the dumps come out infrequently enough that many fixes can be made in the time between dump creation and publishing of a new todo batch. There shouldn't be too many things that are written in English that aren't listed in the dictionary, don't have a Wikipedia entry, aren't a species, and aren't capitalized like a proper noun. (I can ignore all those things.) If we do need to use {{not a typo}} or {{Proper name}} frequently, it would be good to see if there are any patterns we can use to reduce the number of false alarms even further. -- Beland (talk) 09:03, 14 April 2018 (UTC)
Scientific terms
editwhat are we to do with scientific terms? adding notatypo tags to each systematic chemical name seems a bit... inelegant --Probablysomeoneelse 13:12, 27 April 2015 (UTC) — Preceding unsigned comment added by Annekcm (talk • contribs)
- Well, if we think they deserve Wikipedia articles all their own, we can create stubs. We could also create redirects if they don't deserve their own article but are on a list somewhere, or the systematic name refers to a chemical that has an article under a different name? I could also make a dedicated systematic chemical name detector if none of those solutions seem scalable. -- Beland (talk) 09:14, 14 April 2018 (UTC)
- There are also a lot of chemical name fragments, because the full chemical name may be split with - or (). I have been writing some chemical articles to neutralise some of these, and adding redirects for chemical groups that don't warrant an article, but there really are higher priority article to write! It would be good if we could come up with a pattern matcher to identify chemical names, which could then go in a separate list, to avoid cluttering the spelling mistake list. (Chemical names have errors too). Other terms showing up are medical or mathematics terms, but they too can have redirects or short articles. Graeme Bartlett (talk) 13:05, 29 April 2018 (UTC)
- Okey, I'll ponder how to do that for the next run. -- Beland (talk) 01:53, 27 July 2018 (UTC)
- FTR, chemical terms are now reported separately. -- Beland (talk) 07:16, 3 June 2022 (UTC)
- Okey, I'll ponder how to do that for the next run. -- Beland (talk) 01:53, 27 July 2018 (UTC)
- There are also a lot of chemical name fragments, because the full chemical name may be split with - or (). I have been writing some chemical articles to neutralise some of these, and adding redirects for chemical groups that don't warrant an article, but there really are higher priority article to write! It would be good if we could come up with a pattern matcher to identify chemical names, which could then go in a separate list, to avoid cluttering the spelling mistake list. (Chemical names have errors too). Other terms showing up are medical or mathematics terms, but they too can have redirects or short articles. Graeme Bartlett (talk) 13:05, 29 April 2018 (UTC)
Instructions for editors
edit"For foreign (non-English) language words: Edit the article and use the [undefined] Error: {{Lang}}: no text (help) template to mark all foreign-language passages."
Ain't nobody got time for that! Well, not nobody, but most of us have got better things to do with our lives than jump through your hoops.
I picked a Hungarian word at random from the list: napjainkig. What I want to do is simply flag that this is not a typo. Instead I'm being instructed to fix the encyclopaedia according to your rather anal procedures. What I've done instead is create a Hungarian Wiktionary page for napjainkig, which should be enough to help editors realise that the word is not a typo. Dadge (talk) 09:33, 9 June 2015 (UTC)
- Thanks for creating the dictionary entry, that's actually the most useful thing to do. I'm not looking anywhere other than the English Wiktionary for now, but if things get fancier in the future it's useful to have reliable underlying data. I think in the long run, we do want to have {{lang}} tags on all non-English text in the English Wikipedia. If you've ever used a text-to-speech system to read articles out loud, you'll know it can sound rather ridiculous if the system tries to pronounce a foreign language with English rules. We'd also like to know which language the word is intended to be in; I'm sure many misspellings of English words are valid spellings of some other word in some other language. If we want to catch all the misspellings, we have to not only confine our search to the intended language, but also the intended dialect (like US vs. UK), which is a feature I hope to add some day. -- Beland (talk) 09:19, 14 April 2018 (UTC)
- FTR, all Wiktionaries are now consulted for spell-checking purposes, but adding to Wiktionary is still the most useful thing to do. -- Beland (talk) 07:16, 3 June 2022 (UTC)
B-Y?
edit@Beland: This is a great project, but I was wondering where articles with a singular misspelling starting from B to Y are. Are they omitted to keep the list shorter? Could they perhaps be added on other pages? Thanks, Darylgolden(talk) Ping when replying 06:55, 29 April 2018 (UTC)
- There is enough to do already! you might notice that the A's are not complete either. Graeme Bartlett (talk) 13:06, 29 April 2018 (UTC)
- @Darylgolden: Oh, hey, sorry I missed this message earlier. Yeah, there are over 300,000 articles with a single typo. I only posted a sampling to keep the page a reasonable size. It also helps if we alternate the letters we focus on so that we don't duplicate work; the dumps take a few weeks to process on the server side, and then another day on my side, and in that time if we've fixed a bunch of "A" spellings we'd want to start "B" so we don't try to re-do the "A" ones that were fixed during processing. Someday I hope to have a database where we can just keep going continuously until we hit the bottom of the barrel, but that will take significant engineering. In the meantime, if we do ever get low on misspellings, just ping me and I'll be happy to add thousands more. And thanks for your help on this project! -- Beland (talk) 00:04, 19 July 2018 (UTC)
July 20 dump report has arrived
editI left some of the sections from the July 1 dump (like the entirety of "Y") up to avoid yanking it out from under people who were working on them. Because the the changes we make today won't necessarily be reflected in the next dump (since it's probably already been started but takes a couple weeks to finish), I have to leave a given letter dormant for at least one dump after we've stopped working on it. I have posted a bunch more sections fresh from the July 20 dump, including all of "X" (because it's a small letter). If that's too much to have up all at once, feel free to drop the sections from the older dump, especially if they seem stale in comparison. I'm also using a new algorithm to auto-chunk the listings into smaller sections, since it seems easier to avoid edit conflicts and more fun to finish off entire small sections than trudge through one big one. Let me know if that's not working out! -- Beland (talk) 01:59, 27 July 2018 (UTC)
@Beland: Should we remove the section once everything is done? AmericanAir88 (talk) 20:17, 31 July 2018 (UTC)
- I would suggest that we leave this for Beland to do, to avoid edit conflicts when sections suddenly get renumbered at random times. Also leaving it there for a few days will let others see what has happened, rather than thinking something got lost or missing. Graeme Bartlett (talk) 22:29, 31 July 2018 (UTC)
@Graeme Bartlett: I meant remove the header also. Removing the striked entries helps reduce the page size and Beland does that already. AmericanAir88 (talk) 01:43, 1 August 2018 (UTC)
It seems someone is already removing headers.
- I hope the software wouldn't generate an edit conflict even if a section is removed completely (and someone else is editing a different section) but I haven't tested to see if that happens. I do see sometimes if I haven't refreshed sometimes I'll click on an edit link and get the wrong section because one above me has been removed. There are a lot of small sections, so removing them completely does help clean up the TOC, though it's also a good point sometimes people get a boost from seeing good work that was done, which is why I started emptying sections but leaving the header, at least in the middle of a dump cycle (just following what other people were doing). It doesn't matter either way to me, since I go back and look at the version of the page at the time I did the last dump to figure out what letters should be posted this time, so I'm not confused by removed sections. -- Beland (talk) 01:59, 1 August 2018 (UTC)
@Beland: Would it mess with the mechanism though? Also am I allowed to do it? AmericanAir88 (talk) 13:52, 1 August 2018 (UTC)
- @AmericanAir88: I post the listings manually, so remove away if you like. -- Beland (talk) 14:53, 1 August 2018 (UTC)
- @Beland: Thank you. AmericanAir88 (talk) 15:06, 1 August 2018 (UTC)
Lang tags - where to get advice/approval
editI've made some edits to Heonjong of Joseon, which has one section where the same/similar text is repeated in three ways: as transliterated hangul, in hangul, and in hanja.
Hangul is easy - I just used {{lang|ko|...}}
For the version using Han characters I used {{lang|ko-hani||...}}, but I don't know if that is an appropriate classification.
For the version using Latin characters, I punted and used {{transl|ko|...}}, because I couldn't tell if {{lang|ko-Latn||...}} would really be okay to do.
Any ideas where to find the experts who could say yea or nay on these, and/or suggest the correct codifications? Shenme (talk) 22:31, 18 August 2018 (UTC)
- Hi Shenme. Template:Korean should do the trick. It includes parameters for hangul, hanja, Revised Romanization, and McCune–Reischauer romanization. Lenoresm (talk) 01:36, 19 August 2018 (UTC)
More specific search links?
editWould it be possible to express the search links a bit more precisely? When choosing the present item efficiencyEfficient we then see a search results page with "Results 1 – 21 of 10,090"
If I instead ask for "efficiencyEfficient" I get a results page saying "Result 1 of 1" and Milivoje Kostic with text "... with emphasis on efficiencyEfficient energy use and ..."
This problem is also seen elsewhere, for example with herophiliHerophilus where only 1 of 3 hits was interesting.
The addition of the double-quotes, e.g. search=%22efficiencyEfficient%22 , makes a lot of difference. Shenme (talk) 23:15, 20 August 2018 (UTC)
- Well you can apply the quotes yourself if the search result is too broad. For many of the typos the broader search brings up similar typos and errors that should also be fixed. Every time there is an article with a listed error suspect, we should search for and fix the same error elsewhere in Wikipedia articles. Graeme Bartlett (talk) 02:45, 21 August 2018 (UTC)
- Heh, sometimes I cruise RecentChanges just to find someone fixing a misspelling once, where it is present many times over. Nice calming work item before signing off. Shenme (talk) 22:25, 21 August 2018 (UTC)
- @Shenme: I have changed the code so that quote marks will show up in "find all" links after the next run (currently waiting for the September 1 dumps to be posted for download). Hopefully that will help; thanks for the suggestion! -- Beland (talk) 19:00, 6 September 2018 (UTC)
- @Graeme Bartlett: First, thanks for the prolific typo-fixing you've been doing! You definitely have a copyeditor's instinct in wanting to find all of the instances of a given typo. My goal is to make editors as productive as possible, but to get the project started I mostly just tried a bunch of easy things, so there's definitely room for improvement. So far the most-used list seems to be the one that sorts typos by article. Traditionally it's been articles with a single typo, but recently I added a list of articles with five typos, in case that makes people more productive or people like it. The idea is that by showing all the typos in a given article, the number of edits is minimized, which theoretically makes typo-fixing editors the most productive in the long run. I didn't provide "find all instances of the same typo" links because any other instances would eventually come around when the articles that contain that typo get posted. Those articles might have more than one typo, but the listing for the article would give those so they could all be fixed in a single edit.
- However, If you find it more satisfying to fix all the instances of a given typo, I can produce more useful lists that link to all the articles a given typo is found on, so I can save you the trouble of having to type in the search term yourself. I have found a way to make the "most common typo" list useful for this purpose. The current problem is that the "typos" most commonly found in Wikipedia articles are mostly words that actually just need to be added to Wiktionary. However, I have hooked in an algorithm that checks "typos" to see if they are very close to words that are already in the dictionary. This lets me break down the "most common typos" list into "probably most common actual typos" and "probably most common new words" lists, the former of which should be faster for the sort of task you're talking about. I'll post these in the next moss run; let me know how you like them and if there's anything else I can do to speed up anyone's workflow. -- Beland (talk) 19:00, 6 September 2018 (UTC)
- Heh, sometimes I cruise RecentChanges just to find someone fixing a misspelling once, where it is present many times over. Nice calming work item before signing off. Shenme (talk) 22:25, 21 August 2018 (UTC)
- Re: all instances of a given typo: there are times when only by looking at several occurrences can one become sensitized to the situation that there are multiple words a typo could be. Scanning quickly I see in my history 'councel'. Well, sometimes it was 'council' and sometimes it was 'counsel'. I've seen the situation where one typo string could be as many as four different words! So I like the availability of those notes.
- The drawback is the quantity. One item may be dozens or more. Anybody want to finish up the last 400+ articles needing 'tahsil' -> 'tehsil'? ;-) Shenme (talk) 19:39, 6 September 2018 (UTC)
- Well this project has triggered me to use WP:AWB which is great at repetitive corrections. So now instead of a link added for all the errors, it is better to just have the number of them show up. If it is over 3 then AWB is good. If less then editing a couple of pages is easier. It also will fix up some other errors on the way. But it is not fool proof so you still have to check, and may need to manually correct. I could easily do 'tahsil' -> 'tehsil' a few hundred times, but I don't even know what that is! Graeme Bartlett (talk) 21:55, 6 September 2018 (UTC)
- Ah, interesting to know on both counts. I guess it's good we're keeping humans in the loop, even if the number of fixes is a bit daunting sometimes. At least we're making good progress on the worst offenders. -- Beland (talk) 18:58, 7 September 2018 (UTC)
- Well this project has triggered me to use WP:AWB which is great at repetitive corrections. So now instead of a link added for all the errors, it is better to just have the number of them show up. If it is over 3 then AWB is good. If less then editing a couple of pages is easier. It also will fix up some other errors on the way. But it is not fool proof so you still have to check, and may need to manually correct. I could easily do 'tahsil' -> 'tehsil' a few hundred times, but I don't even know what that is! Graeme Bartlett (talk) 21:55, 6 September 2018 (UTC)
Another parse/splitting problem?
editUnder section Baba-Babc is
Wiktionary has wikt:nagajuban, so I figured there was a problem where the 's' was somehow misplaced in the article. But when I look in that article I find
- ''[[nagajuban]]s''
where nagajuban is a redirect to Kimono#Accessories and related garments. Redirect is from 2010 and wikt entry from 2016.
Where did the 'nagajubans' come from? Shenme (talk) 05:10, 9 September 2018 (UTC)
- The parse looks OK to me. "nagajubans" is what appears on the screen. So for it not to be listed in this exercise there would need to be nagajubans in Wiktionary or as a redirect/article. Is it valid as a plural? Graeme Bartlett (talk) 07:09, 9 September 2018 (UTC)
- I suppose the English plural wikt:nagajubans could be created, but as someone was explicitly trying to avoid the problem here at WP (i.e. creating a redirect for nagajubans) but rather explicitly coding [[nagajuban]]s -- that is, reference the defined term in link then append the 's' -- I think we're missing the point of parsing what was coded, the isolated defined term 'nagajuban'.
- I'm encountering quite a number of problems with piped text or links plus appended text. For instance [[Frisian handball|kaatser]], where someone has explicitly said that 'kaatser' is actually defined somewhere else. (So *don't* go look for wikt:kaatser) Or [[calabash|calabass]], where the author/group has a variant spelling, but we know the correct place to find the definition.
- If we force into Wiktionary every possible spelling variation of a term, or every other (regional) alternate word for a term, we'll end up degrading Wiktionary. Does Wiktionary have disambiguation pages?
- If we force changing every string here to be found/defined in Wiktionary we'll deform the prose here, even when the editor here *knows* the word is variant and has pointed to the correct place to find it.
- In the three cases I mentioned above, we have a perfectly good mechanism to avoid impinging/warping/tromping on Wiktionary. Why can't we respect the link intentions? Shenme (talk) 05:08, 10 September 2018 (UTC)
- kaatser should be added to Wiktionary. It appears to be the Dutch word for the handball player. So you can lang|nl it. kaatsen is the name for the ball game in Dutch. Frisian handball appears as fy:keatsen in the Frisian language, but my Frisian dictionary says that playing the sport is keatse, and that keatsen is the ball game and keatser is the player. All is missing from Wiktionary. On the topic of piped links, being piped does not stop it from having a spelling . So I think it is good to check the words out. "calabass" may need a not a typo or some other markup, as I think just being in the link is not really enough to keep it out of the spelling errors, which it must be. Graeme Bartlett (talk) 11:57, 10 September 2018 (UTC)
- Yeah, Wiktionary wants all regional variations of spelling, and all inflected forms (plural, past tense, etc.). wikt:Wiktionary:Criteria for inclusion has a complete explanation if you're curious, but in general it includes all word-forms in all languages except for most proper nouns (which are in the encyclopedia if they are notable, not the dictionary) and systematic constructions like large numbers. Proper nouns in English are almost always capitalized, so those are ignored by the spell checker. Wikipedia only has one article per topic, so you'll see things like [[car|automobile]] since those are two words that mean the same thing, but obviously both "car" and "automobile" are words that should be in the dictionary. -- Beland (talk) 18:28, 29 September 2018 (UTC)
- kaatser should be added to Wiktionary. It appears to be the Dutch word for the handball player. So you can lang|nl it. kaatsen is the name for the ball game in Dutch. Frisian handball appears as fy:keatsen in the Frisian language, but my Frisian dictionary says that playing the sport is keatse, and that keatsen is the ball game and keatser is the player. All is missing from Wiktionary. On the topic of piped links, being piped does not stop it from having a spelling . So I think it is good to check the words out. "calabass" may need a not a typo or some other markup, as I think just being in the link is not really enough to keep it out of the spelling errors, which it must be. Graeme Bartlett (talk) 11:57, 10 September 2018 (UTC)
- In the three cases I mentioned above, we have a perfectly good mechanism to avoid impinging/warping/tromping on Wiktionary. Why can't we respect the link intentions? Shenme (talk) 05:08, 10 September 2018 (UTC)
It's not a definition, it's a kind of thing
editBabb's Bridge - wikt:queenspost
This term is used as a single word - queenspost truss bridge
That can get you to Truss_bridge#Queenpost_truss which then gets you to Queen post
Note that there and at Truss#King_post_truss there is mention:
- The queen post truss, sometimes queenpost or queenspost, is ...
So... what to do with this variant word, for which there is a definition here, but not a wikt? Shenme (talk) 04:41, 12 September 2018 (UTC)
- Looking more broadly on google, it normally has a capital Q. It is used enough in books to get to Wiktionary, but perhaps only with a Q and not a q. I have added a redirect, so the next run should not feature this as an error. But also we could change the capitalisation. Graeme Bartlett (talk) 06:34, 12 September 2018 (UTC)
- Mmm, lowercase looks fine to me. -- Beland (talk) 07:16, 3 June 2022 (UTC)
spaced ndash valid word separator
editI believe some cases of "word1{{spaced ndash}}word2" are being interpreted as "word1word2". Of course, there is a clear separation. Can this usage be removed from the "missing whitespace" test? The template has several aliases. David Brooks (talk) 02:20, 11 October 2018 (UTC)
- Oh, I just noticed Beland's "There was a problem not handling character substitution templates like {{mdash}}", so I assume {{snd}} is covered by that. As you were. David Brooks (talk) 02:26, 11 October 2018 (UTC)
Lists D-Z
editI'm curious why the lists are only for numbers and A-C. — Preceding unsigned comment added by Aurornisxui (talk • contribs) 16:23, 10 December 2018 (UTC)
- Hi, I'm not the creator or even been here for very long by any stretch, but I think I asked a question like yours when I first started, and the answer I discovered is:
- While the other lists do exist, there are enough articles to edit on the current one to fill up the month of time they're up. If we ever do collectively manage to polish one off, the account who runs the bot said they would happily put up another list.
- Best as I can figure. Cheers! Elfabet (talk) 18:42, 14 December 2018 (UTC)
- Yeah, if I posted the whole alphabet, the last letters would be very stale by the time we got to them, if we went in order. If we don't go in order, it's a bit difficult to avoid re-listing typos that were fixed between the time the last dump was frozen and when the results were posted (which can take over a week). -- Beland (talk) 04:53, 18 January 2019 (UTC)
Thank you for this moss project
editTo Beland and everyone else who works on this typo project, it is impressive. Well done and thank you! Ken K. Smith (a.k.a. Thin Smek) 01:18, 16 December 2018 (UTC) — Preceding unsigned comment added by Thin Smek (talk • contribs)
Tagging User:Beland. Ken K. Smith (a.k.a. Thin Smek) (talk) 17:33, 17 January 2019 (UTC)
- Hey, without all the work put in by editors who are reading the reports and fixing typos and researching words and updating the dictionary, none of what I did would have been useful at all. So thanks to all of you! -- Beland (talk) 03:12, 17 February 2019 (UTC)
Animal sounds
editThere are quite a few animal sounds showing on the list, particularly in the repeated patterns section. Is there a template to mark those? Graeme Bartlett (talk) 03:44, 31 December 2018 (UTC)
- I've just been using {{not a typo}}. -- Beland (talk) 09:41, 23 February 2019 (UTC)
Two general cases to add to instructions
editWhat's the best way to handle cases where a quote is adjusted to reflect a change in capitalization due to sentence restructure? e.g.: "[O]ther than that..."
2) Companies making up words (mostly of social media software applications) that effectively emulate a noun or verb for their not-actually-unique capabilities. (flattr, most recently for example).
I personally don't believe they deserve a Wiktionary article until they have "clearly widespread use" (per wikt:CFI), but that's still a judgement call. So until they do, do we try to write them out per WP:JARGON? Or would is it preferred to just {notatypo} them?
I don't know if these are the type of thing that can be addressed in the searches/rules for list inclusion, or maybe just a quick addendum to current editor guidelines moving forwards? Elfabet (talk) 15:09, 14 January 2019 (UTC)
- For (2) I would suggest that we have a redirect on Wikipedia to the company/product page. At least then people can easily find what that word means. We could add the notatypo template if it looks like a spelling error that others may "fix" inappropriately. Graeme Bartlett (talk) 07:40, 15 January 2019 (UTC)
- For (1), if only the capital has changed then I think that we should just use the capitalization as found in the original source and so use "other than that...". If I see [O], I would think that the word use was not "other", but perhaps "Another" or "father". Graeme Bartlett (talk) 07:40, 15 January 2019 (UTC)
- In case (2), it usually happens on their page. Flattr, for example, uses the word 'flattr' as a noun and verb multiple times on their page, in trying to describe their product/service. (At least it did, before I went through and tried to word them out.) It's difficult though, because it's "an act of indicating you're willing/wishing to designate the site/artist as worthy of recieving a portion of your monthly subscription service payment" or just "the denotation that ..." (in noun form). Which is junky.
- For case 1, if you do that, you're changing the intention of the quote, which we should not do. The only case is when the gramar or clarity of the quote is increased when taken out of context. wp:QUOTE
Consider the full quote: "I'd like to think that, other than that issue with Mr. Johnson, the day was a rousing success".
- If you want to shorten that to take the meaning of it without the extra fluff preposition at front, and still quote it directly as a new sentence, you'd make it: "[O]ther than that..." because of the need for a capital leading the sentence. This is a semi-common enough occurence in my opinion, that while it's good to check the rest of the word for spelling, it'd be nice if the parser that finds the misspellings could strip it of it's single ]'s and test it as a whole word.
- So I suppose I'm asking, are there other cases where a word is split by a ']' and they shouldn't be checked for spelling? If not, can we have that functionality added to the searcher? Elfabet (talk) 14:15, 15 January 2019 (UTC)
- @Elfabet: If you are seeing the spell checker complain about something inside a direct quotation, then most likely there is a problem with the quote marks. Per MOS:QUOTEMARKS, they must be "straight double" and not “curly” nor `backtick´ nor 'single quote' except in particular circumstances described in MOS:DOUBLE. The spell checker ignores everything inside straight double quotes, presuming they are correctly paired. So with correct quote marks, something like "[O]ther than that" should be ignored, and there shouldn't be any need to put in a special feature for it. Though MOS:CONFORM does seem to say "Other than that" is preferred even if the original text says "other". -- Beland (talk) 05:03, 18 January 2019 (UTC)
- @Beland: Very interesting. I guess I'll look to change the quote type next time I see them, or bring similar but different situations to your attention as they come up. Thanks, Elfabet (talk) 13:02, 18 January 2019 (UTC)
I agree, making a redirect from something like flattrd to Flattr will cause "flattrd" to be marked as correctly spelled, and also help readers. -- Beland (talk) 15:33, 18 January 2019 (UTC)
- Aren't there rules against making self-referential links? The only place 'flattrd' shows up is on Flattr. Elfabet (talk) 22:20, 18 January 2019 (UTC)
- @Elfabet: As long as the text "flattrd" in the article isn't a link, it won't be a circular reference, nor would it be a double redirect, if that's what you're thinking of. -- Beland (talk) 21:33, 13 February 2019 (UTC)
- @Beland: I was misinterpreting the suggestion, forgive my confusion. I didn't understand that it might be worth while to make a page dedicated to redirecting a word, each time it shows up on our list. Is it a fairly straightforward process? Elfabet (talk) 21:58, 13 February 2019 (UTC)
- @Elfabet: Sure, the minimum required is to go to e.g. flattrd and put in the text "#REDIRECT [[Flattr]]". -- Beland (talk) 13:28, 15 February 2019 (UTC)
Full update with new reports
editJust wanted to let folks know that after many months of partial updates, as of today all sections on the main moss page have been updated from the 2019-02-01 dump report, the most recent available. Hopefully this will provide a fresh batch of words for adding to Wiktionary, identification mysteries, and other fun queues.
Some folks may be curious about the new sections in the "main listings" for the by-article reports. As we've been making relatively fast progress on the pages with only T1s, I've started to worry about separating all the other potential typos into likely vs. unlikely piles. The T2 and T3 typos are also likely real misspellings, but less likely than the T1s. Rather than slow us down with all the T2s and T3s for now I'm just including pages that have T1s and also only T2s and T3s. It turns out the TS typos (which look like two English words jammed together, sometimes with punctuation between) are also very often real typos. I've started including TS+dot, which are almost all someone forgetting to put a space after a period. I'm waiting to include the remaining TS typos until we cycle back around to before A, at which point there will be enough room on a single-letter page to include them all (since we'll have fixed an enormous number of T1s). I think most of the remaining unclassified "typos" are actually just untagged words from other languages, so my next big programming task will be hooking in all the non-English Wiktionaries and trying to automatically filter those out.
Many thanks to everyone who has been chipping away at problem reports, especially the Wikitionary word sleuths and other folks I've seen working on the main page over the past months, including -sche, Darylgolden, XY3999, Graeme Bartlett, Wayne aus, Sherlotte, AmericanAir88, Wire723, and Schazjmd. Ira Leviton, Elfabet have been clearing away sections and sections on the main listings with help from a growing crowd too big to thank here, but including bradleyagin and MarkZusab who specifically requested to be notified of the new dumps. All the numbers I track are telling me we're getting closer to knowing all the words and spelling them all correctly, even as the total amount of text on Wikipedia keeps growing around us. I hope that process is as satisfying for you as it is for me. -- Beland (talk) 09:29, 23 February 2019 (UTC)
- Neat! Elfabet (talk) 04:01, 24 February 2019 (UTC)
A tip and a question
editSomething I just realized that helped me and maybe some others: if you middle-click to open the link for the article, also middle click on the wikt entry for each word so that you can see the words you are looking for in your open tabs, without referring back to the list page. Helps me anyway, because I'm mostly doing TS+DOT. Which brings me to my question: does anyone know how to make the Find box automatically pop up (in FireFox) when I'm editing a page? I know wikiEd has a search box, but my wikiEd doesn't work unless I reload the page after clicking edit page. Just trying to cut back on steps it takes. —Amiodarone talk 16:36, 2 April 2019 (UTC)
- I'm considering binding Ctrl+F to one of the buttons on my mouse or keyboard, and adopting *.Th* as my typo. :) —Amiodarone talk 16:40, 2 April 2019 (UTC)
- I found this for your question, at the header Find while typing on a page. I guess that tip works, but if you're woried about one more Ctrl+F for finding a page, then I would be equally bothered by another two middle mouse clicks (or one and a Ctrl+W to close it). :D. But you're welcome to that typo, so long as you don't mind me taking it part-time - almost always easier than fighting strange mispellings Elfabet (talk) 18:24, 2 April 2019 (UTC)
- I got one of those mice with a thumb button and I made it hit Ctrl+F. Works really quick, and if I click it on the 2017 editor box at the top it opens its own find/replace box. I just find it easier than reaching for the Ctrl key I guess :) —Amiodarone talk 18:31, 2 April 2019 (UTC)
- I found this for your question, at the header Find while typing on a page. I guess that tip works, but if you're woried about one more Ctrl+F for finding a page, then I would be equally bothered by another two middle mouse clicks (or one and a Ctrl+W to close it). :D. But you're welcome to that typo, so long as you don't mind me taking it part-time - almost always easier than fighting strange mispellings Elfabet (talk) 18:24, 2 April 2019 (UTC)
Would anyone happen to know what search term to use to find *.Th* (* wildcard) for purposes of a search link? —Amiodarone talk 19:16, 2 April 2019 (UTC)
- If I understand correctly, I think you're looking for something like "insource:/\.Th/" if you're using the built-in Wikipedia search engine. For example [1] which just now reported 17,496 matching articles (though the search takes so long it times out, so there might be other matches that will show up once some of those are fixed). I have also found that if I copy a good string (like ". Th") with Ctrl+c, then I can put the bad string e.g. ".Th" in the Firefox find box (triggered by Ctrl+f), click on the wikitext contents, and then Ctrl+g followed by Ctrl+v will replace the bad string with the new one. That's easy to repeat for when there are multiple instances on the same pages, maybe not so much for one-per-page typos. -- Beland (talk) 00:41, 13 April 2019 (UTC)
how to fix misspellings in references?
editHi, I was working on the list of articles with "otherworldly" misspelled as "otherwordly." However, one issue I'm finding is that I can't fix the error because it's part of the references. According to the [sic] template page, you can't combine the sic template with citation styles since it breaks things. I've come across this issue a few times now and I am not sure how to handle it, or alternatively, how to make sure moss doesn't pick it up everytime it scans. Any ideas? Cinnamingirl (talk) 17:02, 30 April 2019 (UTC)
- Wikipedia:Typo Team/moss#How the lists are made says that everything in references is ignored anyway - is that still true? My practice, now that {{sic}} is deprecated in citation templates, is to add an HTML comment after the word:
otherwordly<!--sic-->
, and some of my AWB rules are coded to ignore misspellings followed by a comment of this form. -- John of Reading (talk) 17:45, 30 April 2019 (UTC)- Thanks for the reply, John of Reading - I hadn't seen that part of the list, but I'm looking at the Wikipedia:Typo_Team/moss#Likely_misspellings_by_frequency_(n-z) which doesn't have a date but I suppose is fairly recent? It's probable that when I am clicking the search all button I'm getting things that include references even though Moss hasn't found them. — Preceding unsigned comment added by Cinnamingirl (talk • contribs) 02:49, 2 May 2019 (UTC)
- @Cinnamingirl: Yes, the "find all" links just search for any kind of match in mainspace, even those that the moss project has carefully excluded. -- John of Reading (talk) 08:13, 2 May 2019 (UTC)
- @John of Reading: Great, thank you for clarifying! ^_^ Cinnamingirl (talk) 18:26, 10 May 2019 (UTC)
- Note that most misspellings in references are just errors on Wikipedia, so they are worth closer examination to check and fix, but there are plenty for sure that need the sic tag. Graeme Bartlett (talk) 09:52, 29 June 2019 (UTC)
- @John of Reading: Great, thank you for clarifying! ^_^ Cinnamingirl (talk) 18:26, 10 May 2019 (UTC)
- @Cinnamingirl: Yes, the "find all" links just search for any kind of match in mainspace, even those that the moss project has carefully excluded. -- John of Reading (talk) 08:13, 2 May 2019 (UTC)
- Thanks for the reply, John of Reading - I hadn't seen that part of the list, but I'm looking at the Wikipedia:Typo_Team/moss#Likely_misspellings_by_frequency_(n-z) which doesn't have a date but I suppose is fairly recent? It's probable that when I am clicking the search all button I'm getting things that include references even though Moss hasn't found them. — Preceding unsigned comment added by Cinnamingirl (talk • contribs) 02:49, 2 May 2019 (UTC)
Makeshift semi-automated way of fixing typos
editHello people, I've recently came up with a very makeshift method of fixing typos more quickly. This method only helps to reduce the time it takes to navigate to each typo; all fixing is still done manually. This only works on Wikipedia:Typo Team/moss/I for now but can very easily be extended to other pages.
- Install JWB and go to Wikipedia:AutoWikiBrowser/Script (it's a red link, but should work).
- Under the "setup" tab, click the "Generate" button under Page list.
- Tick "Links on page" and enter "Wikipedia:Typo Team/moss/I", then click "Generate".
- Under the editing tab, fill in an appropriate edit summary.
- Copy and paste the text in this page and paste it in the "Replace:" text box.
- Enter "lfixl" (or some placeholder word) into the "Replace with:" text box.
- Check the "Regular Expression" checkbox.
- Click "More replace fields" and enter the same rule a few more times (for some reason, each rule only appears to work on one instance of a typo on a page).
- Click "start", and Ctrl+F "lfixl" or your placeholder word to find it in the editing window. Look at the left diff to figure out what the typo was, then replace the placeholder word with the corrected word. Click "save" and repeat.
You'll still have to remove the words from the project page manually, but it shouldn't be too hard for someone to create an AWB script to do it. This process is still very inefficient, and anyone with some AWB and/or programming experience will probably be able to improve it significantly, but I think it's still a lot more efficient than navigating to typos in a browser. Darylgolden(talk) Ping when replying 09:35, 7 February 2019 (UTC)
- I think a tool like citation hunt would be very helpful, but that would take a lot of development. Darylgolden(talk) Ping when replying 11:44, 7 February 2019 (UTC)
- I hope we'll eventually have such a tool; that looks awesome! You're right, though, it does take work to make labor-saving devices sometimes. 8/ -- Beland (talk) 09:36, 23 February 2019 (UTC)
New script allows correcting typos in one click
editHi all, I created User:Uziel302/typos.js to enable correcting obvious typos in one click.
Just like this project, I find words that aren't on Wiktionary and are similar to common words.
You can see an example of how it works in this video.
After adding: importScript('User:Uziel302/typos.js'); to User:NAME/common.js you will have "Replace" button on User:Uziel302/Typos.
Would appreciate any help correcting the lists of typos my program finds in Wikipedia. Uziel302 (talk) 17:50, 27 April 2019 (UTC)
Usage of {{typo help inline}} and {{which lang}}
edit@Beland: I think I'm seeing some editors who are putting the templates on the moss lists instead of on the words in the actual articles. (e.g. Maralith here didn't have any template on the word in question in the article.)
Could we maybe clarify the usage note at the top of the pages where the templates should be used? Elfabet (talk) 12:57, 16 May 2019 (UTC)
- Hmm, it seemed pretty clear to begin with, but I added italics to emphasize that they should go in the article. Feel free to tweak further if you think it would help; it's transcluded from Wikipedia:Typo Team/moss/quick start. Maybe sometimes it just takes some practice to get the hang of a somewhat complex procedure. 8) -- Beland (talk) 17:13, 16 May 2019 (UTC)
"More! More! the editors cried as they came a-knocking at the lists...~"
edit"...But don't let them in until it's all pared up/ Somebody shouted MacIntyre."
@Beland: do we have another list ready, or are we waiting on a dump? Elfabet (talk) 17:35, 11 June 2019 (UTC)
- @Elfabet: I posted Wikipedia:Typo Team/moss/O a while ago, but it looks like we'll probably finish that before the Jun 20 dump is ready (yay!) so I'll post "P" as well. -- Beland (talk) 17:46, 11 June 2019 (UTC)
Ruby slippers?
editLooking over ideas, I was very surprised to see under HTML tags the tag <ruby> and friends in the list of tags. I mean wow templates have gotten sophisticated if they can do ruby text without browser support! Alas, I really don't think they can. Is there any centralized place for documentation of template equivalents for HTML tags?
Though I must admit that seeing <tt> replaced with {{mono}} rather grates. Where is that need mentioned? Is it purely non-conforming-features mentioned in the template's doc? Ahh, I see Wikipedia:HTML 5 and Wikipedia:HTML 5#tt, which has more nuanced directions than the {{mono}} page. Oh dear, I hope people don't blanket change tt to mono without these nuances. So confusing. Shenme (talk) 06:21, 29 June 2019 (UTC)
- Templates like {{ruby}} do in fact use tags like <ruby>. The templates are theoretically just a more editor-friendly way to write them, and makes it easy to change the actual tags used en masse if needed in the future. All the useful documentation I know about is collected on or linked from Category:Articles with HTML markup. -- Beland (talk) 18:33, 4 November 2019 (UTC)
A serving of "Old Malay" anyone?
editSo I looked at 100+ words under Articles with the most possibly misspelled words and ended up at Talang Tuo inscription. The long quoted inscription is well-documented as being transliterated from "Old Malay". Unfortunately Old Malay has no language code to be used with {{lang}} or {{transl}}. Using 'ms' for Malay would be wrong (see Old Malay).
Am I really stuck with {{not a typo}}?
And why don't these lovely templates ever allow |notes=blahblah or |reason=blahblah or the like? It would be wonderful to add a note as to why one is using an inappropriate template. Or if someone wanted to use {{lang|und}}, some explanation. Shenme (talk) 05:00, 30 June 2019 (UTC)
- @Shenme: You can always put an HTML comment near the tag to explain to editors what's going on. For example: {{lang|und}}<!-- no language code for Old Malay-->
- I will ask on Template talk:Lang if "und" is the right code in this situation. -- Beland (talk) 21:29, 4 November 2019 (UTC)
- @Shenme: The answer was that we can use {{lang|mis}} for uncoded languages. I'd still add an HTML comment indicating what language the snippet actually is.
Updated from 2018-08-20 dump
edit@Schazjmd, Bradleyagin, Darylgolden, MarkZusab, Amiodarone, Zojomars, and Anarhistička Maca: This is your official update notification - I've updated all sections from the 2019-08-20 dump, in addition to the much more frequent mini-updates to the main listings. As usual, there are many notes from older dumps and code improvements planned for the next go-round. -- Beland (talk) 19:07, 4 September 2019 (UTC)
Now that we are only down to one letter now, it is time to redo some earlier letters, particularly now that the pages selected for listing are different. It would be time to have a listing for "A" again. Graeme Bartlett (talk) 01:25, 17 September 2019 (UTC)
Mineral words and flinty problems
editScanned down the list in Mineral words looking for my favorite stumper: 'sesquicarbonate'. Strangely, wikt does have sesquicarbonates, but that points to a non-existent entry for sesquicarbonate. sesquicarbonates was recently created 4 August 2019, perhaps as a consequence of typo/moss? How much effort should we put into defining things? That is, wrapping up all the loose ends (rocks?) before using scissors on the list entries? Shenme (talk) 03:36, 13 October 2019 (UTC)
- Interesting: Sesqui Shenme (talk) 03:38, 13 October 2019 (UTC)
- @Shenme: Personally, I don't create dictionary entries unless I know the meaning of the word well enough to supply a good definition. Usually the process of finding out what the word means is a good way to verify that the spelling given is the preferred one. Words shouldn't be removed from the mineral list unless they're added to Wiktionary or a Wikipedia article or redirect is created. Being on the mineral list here is otherwise what is preventing these words from showing up in reports as misspelled. -- Beland (talk) 17:18, 14 November 2019 (UTC)
- I have added in wikt:sesquicarbonate. Perhaps we need wikt:sesquibi- as sesquibicarbonate seems to be a thing too. Graeme Bartlett (talk) 05:03, 22 April 2020 (UTC)
- @Shenme: Personally, I don't create dictionary entries unless I know the meaning of the word well enough to supply a good definition. Usually the process of finding out what the word means is a good way to verify that the spelling given is the preferred one. Words shouldn't be removed from the mineral list unless they're added to Wiktionary or a Wikipedia article or redirect is created. Being on the mineral list here is otherwise what is preventing these words from showing up in reports as misspelled. -- Beland (talk) 17:18, 14 November 2019 (UTC)
Moss typo?
editMoss Team, may I mention a moss? Sphagnum is often spelt "spagnum" or even "spaghnum" and suchlike. Could one look for things that are close to "sphagnum", and prevent Wikipedia from gathering misspelt moss? HLHJ (talk) 21:17, 14 March 2020 (UTC)
- Spagnum is a redirect, so it's being suppressed from listings at the moment. It is in Category:Redirects from misspellings, though, so I can add some code that will mark all of those as misspellings. If present in prose, "spaghnum" will show up as a T2 in the main listings, though due to size I don't post all the listings on every run. In the last run (on article contents as of 2020-02-20), it was only detected in one article:
- -- Beland (talk) 02:46, 18 April 2020 (UTC)
Non-English words | Dohas
editHello Typo Team, I've joined the Typo Team today and I need help in marking Non-English words that are wrongly categorized as Typos.
For example, I came across the word dohas: 23 - wikt:dohas - 40 (number), Abahattha, Abdul Rahim Khan-I-Khana, Caesura, Doha (Indian literature) ... find all
dohas is a plural of the Hindi word doha https://en.wikipedia.org/wiki/Doha_(Indian_literature) & https://en.wikipedia.org/wiki/Doha_(poetry)
So how do we take this word off the Typo list? Kulwinder Rishi (talk) 18:50, 18 April 2020 (UTC)
- Mmm, Hindi does not form plurals by adding "s", so I'd categorize "dohas" as an English word that's borrowed from Hindi. Probably the right thing to do is to add entries for English words at wikt:doha and wikt:dohas. -- Beland (talk) 08:04, 21 April 2020 (UTC)
AWB compatibility
editCan I somehow use the same database dumps that Moss uses on AWB? If so, how? --I dream of horses (talk page) (Contribs) Remember to notify me after replying off my talk page. 03:07, 17 May 2020 (UTC)
- @I dream of horses: Well, moss has a script that downloads some of the files posted twice a month at [2] and then digests them to produce the reports I post. From Wikipedia:AutoWikiBrowser/Database Scanner it looks like some of the same files can be fed to AWB by editors who use it. One of the features I hope to work on soon is to have moss produce custom JWB configuration files so editors can perform certain correction tasks very quickly. I'm not sure how easy or helpful it will be to port that to AWB, but it's mostly just regular expressions, so who knows. -- Beland (talk) 08:41, 3 June 2020 (UTC)
Updated reports from 2020-05-20 dump
edit@Jake The Great 908, Sun Creator, Puddleglum2.0, Schazjmd, Bradleyagin, Darylgolden, MarkZusab, Amiodarone, Zojomars, Anarhistička Maca, Clovermoss, JaAlDo, and Creativecreatr: This is your official notice that reports from a fresh dump are available. I've updated pretty much all the sections in Wikipedia:Typo Team/moss#Misspellings - lists of things to fix from the 2020-05-20 dump (the latest available). As we make progress fixing typos, some reports no longer seem useful, so I've consolidated here and there.
You'll see a lot more typos on the main listing pages (like Wikipedia:Typo Team/moss/A where work is currently underway) because I'm no longer hiding e.g. likely misspellings if there are other suspected typos that moss is uncertain about. (I was trying to avoid having folks editing the same article twice, but most of the unsorted words are in languages other than English, and it'll take us a year or two before we come back to deal with the leftovers, anyway.) I was also unable to keep up with clearing out the manual notes by investigating or tagging as needed, so to prevent duplicate work I made moss smart enough to suppress listings for articles that are already on the main listings subpages. This should let me post new reports as fast as the typo team is ready to take them. I decided to start over at the beginning of the alphabet, since we're now scouring articles more completely, and we can now more easily finish up the letters where we only processed part of a dump before the report got stale. Anyway, I hope these and various other minor upgrades work well for you, and thanks again for helping to correct all the typos in sum of human knowledge. 8) -- Beland (talk) 09:04, 3 June 2020 (UTC)
TS statistics
editSo I've spent most of my time on the typo team clearing TS sections, and I've mentally broken the errors into a few categories:
- The boring category: errors which are most likely just regular old typos
- The other boring category: false positives
- Errors that aren't necessarily apparent in the source view
- Errors obscured in the source view by a ref tag
- Errors where one or both of the words separated by the erroneous punctuation are formatted with markup
- Systematic errors throughout an article, seeming to stem from a genuine lack of knowledge of English punctuation rules on the part of an editor
I'd be curious to see a statistical breakdown of these errors. The bot wouldn't be able to discern any of these categories of course (except for discerning category 3 from the rest, I guess) but maybe in one of the letters of next dump some of us could collect data manually? Idk, would anyone else be interested in this or is it just me
Voidify (talk) 10:41, 6 June 2020 (UTC)
- @Voidify: I'm not actively collecting stats on these different subtypes, but I am trying to make some of them more actionable. I know why ref tags and certain markup often makes the visible text not match the wikitext; that's the markup that moss intentionally removes before it does the spell checking. Stats might tell us whether humans are more likely to make a whitespace typo when there's markup in the way or not, but we detect and fix all of them anyway.
- In particular for (4), the work queue for the Guild of Copy Editors is running low, and they are interested in identifying articles to copyedit. They are interested in pages where someone needs to read an entire article or section to fix grammar or spelling, but nothing else is terribly wrong. If you see articles like that, you can add {{copyedit}} to the top and drop the article from the list here. Sometimes I put the detected typos in an HTML comment or fix them myself, to help out the copy editor. They only really want articles with no unrelated problem tags at the top of the article, for example requesting references.
- For (2), I'm trying to reduce false positives, but most typo fixers leave those behind with notes, and going through those has helped me find patterns that have informed code changes. I'm afraid a lot of them are simply going to have to be tagged as not English. If you have any ideas about how to automatically exclude some of the false positives, feel free to ping me! -- Beland (talk) 02:48, 16 June 2020 (UTC)
More typos when?
editHey so User:Beland— ETA on typos from the new dump? Like is the dump just taking an unusually long time to process, or are you waiting for the A page to reach a certain level of completeness first, or what? I am very new to this so I don’t know a thing about the usual protocol or whatever, but what I do know is that I want to be helpful and what I also know is that the way my brain works means I’m significantly less capable of being helpful while there are only T2-3s left. Like as soon as the new dump is posted I will absolutely go ham on its TSes for hours of hyperfocus at a time, tabbing back and forth, but right now any given section of A would require metric kilospoons of mental energy to clear. I understand if there are reasons why the dump can’t be posted yet, but what I’m saying is, I’d really like some clarity on what these reasons are. Thanks, Voidify (talk) 01:25, 16 June 2020 (UTC)
- @Voidify: Greetings, and thanks for your help clearing typos! Usually the dump that's snapshotted on the first of the month (which includes full page histories) takes at least two weeks to become available for download. You can see the progress if you search for "enwiki" on [3]. After I notice that's ready, it takes 7-8 hours to download everything and then 43-44 hours to run checks on spelling and everything else. The dump that's snapshotted on the 20th of every month only takes a few days to become ready for download, but then it's the same 2-3 day download and processing time. I usually wait until it's clear that the next letter needs to be posted before the next dump will have finished processing, just so the listings are as fresh as possible. (That way we aren't trying to fix typos other editors have found and fixed on their own.) But I'm glad you wrote, because I was wondering if some typo fixers enjoyed the easy ones more, especially since I think the T2s and T3s have gotten less likely to be actual typos now that moss is reporting them whether or not there are other non-reportable typos on the page.
- It looks like "B" is going to be a bit of a beast, and I expect the 2020-06-01 dump will be ready in a few days. The reports need to rest for one dump cycle between posts to avoid stepping on our own toes, so I think I'll hold off on posting all of "B" until the next dump is ready, just in the interest of freshness. But since you're interested in working on TS typos, I can certainly post another letter from the 2020-05-20 dump so you can "go ham" to your heart's content. 8) It looks like "F" is the letter that has least recently been worked on, and it's a reasonable size, so why not! I've just posted an update to Wikipedia:Typo Team/moss/F skipping the less interesting T2s and T3s to maintain freshness. (We'll get to those later.) Feel free to ping me if you ever want more posted; the entire backlog of 200,000+ typos is available to me after every dump is processed. Enjoy, and thanks again! -- Beland (talk) 02:25, 16 June 2020 (UTC)
- @Beland: Thanks! Voidify (talk) 05:31, 16 June 2020 (UTC)
- @Beland: The end few letters (U onwards) are either completely empty or only have a few things in the case notes. Could you put more in there and update the link for new participants? ty --Xurizuri (talk) 12:13, 21 January 2021 (UTC)
- @Xurizuri: Thanks for the ping! Looks like we're out of fresh typos, so I just posted J from the January 1 dump which I just finished processing. I'm trying to go through in alphabetical order as much as possible, mostly just to get the typos that have been waiting around the longest. J thru N in particular have many typos that have never been listed, due to algorithmic changes in the middle of our last run through the alphabet. Sometimes I do post some typos from the end of the alphabet out of order if we need a small batch to tide us over until the next dump is ready. -- Beland (talk) 00:42, 23 January 2021 (UTC)
cquote
editCan templates get retired? Given that MOS says cquote shouldnt be used, why is it still an active template that people can add to articles? I've fixed about 100 articles using cquote and the idea of people adding more to fill that back up makes me want to cry Xurizuri (talk) 13:44, 23 December 2020 (UTC)
- @Xurizuri: Good news! Template:cquote has been fixed so it's now in compliance with MOS:BLOCKQUOTE. We don't have to fix those anymore. -- Beland (talk) 00:46, 23 January 2021 (UTC)
Update 'chemical formulas' to recommend {{chem2}} instead of (or in addition to) {{chem}}?
editIn my editing, I've found that using {{chem2}} is much easier to learn and use over {{chem}}. A simple example:
- C
2H
3O−
2 ({{chem|C|2|H|3|O|2|-}}
) vs. - C2H3O−2 (
{{chem2|C2H3O2(-)}}
)
I think {{chem2}} is intended to replace {{chem}} (but I'm not in the WP:Chemistry community, I'm not sure). sbb (talk) 19:19, 1 March 2021 (UTC)
- Don't use them at all! I work with chemical pages all the time and I hate both these templates and recommend against their use. The reason is that the text produced cannot be copied to look the same as what you see on the screen, and you cannot search for the formula say using "C2H3O2". This is an ACCESSIBILITY problem. Perhaps the output of these templates could be improved to get rid of spaces. We should not recommend using either template. Perhaps we need a chem1 template written that works properly. Graeme Bartlett (talk) 21:37, 1 March 2021 (UTC)
- disagree. It may be an absolute accessibility problem, but on the margin, {{chem2}} appears to be better than {{chem}}. At least for the most part, in source searching, "C2H3O2" can be searched for (in most cases, such as in the example I've shown. I know it doesn't work for more extended cases). But what is the current suggested replacement then, for accessibility AND for typo markup purposes? At least for both {{chem}} and {{chem2}}, a
<span class="chemf">
is wrapped around the jumble of letters, numbers, superscripts, etc. If you don't suggest either of them, then from an accessibility standpoint, what do you suggest? sbb (talk) 02:24, 2 March 2021 (UTC)- I have changed the recommendation to use chem2 instead of chem, as the output now seems to be better than it was before. The issue for this task is to correct the unformatted formulae like "H2O" or "CO2", so if it is already formatted using another way we don't have to touch it. Graeme Bartlett (talk) 21:59, 25 March 2021 (UTC)
- Thanks, greatly appreciated. BTW, the reason I went through and changed everything in Langbeinites was because the spellchecker tagged a huge number of just the first parts of several of the formulae (the ones that had 2-letter symbols (i.e., Mg, Mn) without subscripts bumped up directly against the parentheses of the sulfate group). I was unable to correlate which exact instances of that pattern triggered the checker and which didn't, so I decided to make the entire article consistent and used
{{chem2}}
everywhere in it. - Also BTW,
{{chem2}}
inserts<br>
tags seemingly after every sub/superscript invocation (as does{{chem}}
), which is partially why copying{{chem}}
- or{{chem2}}
-formatted text and pasting it is so broken. As I noted in [[Template talk:chem2#Why does this this template emit
tags?]], using<br>
for non-breaking purposes like that is semantically wrong HTML. Hopefully there's some motion or interest to fix that. sbb (talk) 22:39, 25 March 2021 (UTC)
- Thanks, greatly appreciated. BTW, the reason I went through and changed everything in Langbeinites was because the spellchecker tagged a huge number of just the first parts of several of the formulae (the ones that had 2-letter symbols (i.e., Mg, Mn) without subscripts bumped up directly against the parentheses of the sulfate group). I was unable to correlate which exact instances of that pattern triggered the checker and which didn't, so I decided to make the entire article consistent and used
- I have changed the recommendation to use chem2 instead of chem, as the output now seems to be better than it was before. The issue for this task is to correct the unformatted formulae like "H2O" or "CO2", so if it is already formatted using another way we don't have to touch it. Graeme Bartlett (talk) 21:59, 25 March 2021 (UTC)
- disagree. It may be an absolute accessibility problem, but on the margin, {{chem2}} appears to be better than {{chem}}. At least for the most part, in source searching, "C2H3O2" can be searched for (in most cases, such as in the example I've shown. I know it doesn't work for more extended cases). But what is the current suggested replacement then, for accessibility AND for typo markup purposes? At least for both {{chem}} and {{chem2}}, a
Fresh listings
edit@Jake The Great 908, Sun Creator, Puddleglum2.0, Schazjmd, Bradleyagin, Darylgolden, MarkZusab, Amiodarone, Zojomars, Anarhistička Maca, Clovermoss, JaAlDo, Creativecreatr, Voidify, Doghouse09, Spazure, Idell, and Fehufanga: This is your official notification that all the periodically updated sections on Wikipedia:Typo Team/moss now have listings from the 2021-03-20 dump. Work on Wikipedia:Typo Team/moss/L is also underway from the previous dump. -- Beland (talk) 06:54, 2 April 2021 (UTC)
- @Jake The Great 908, Puddleglum2.0, Schazjmd, Bradleyagin, Darylgolden, MarkZusab, Amiodarone, Zojomars, Anarhistička Maca, Clovermoss, JaAlDo, Creativecreatr, Voidify, Doghouse09, Spazure, Idell, Fehufanga, Triethylborane, Littleb2009, Normal Name, Amazomagisto, TreeReader, and Alivemussel: I've done another round of updates on the main page, and posted the next main listing subpage, which this week is Wikipedia:Typo Team/moss/V. -- Beland (talk) 00:29, 14 November 2021 (UTC)
July update
edit@Jake The Great 908, Sun Creator, Puddleglum2.0, Schazjmd, Bradleyagin, Darylgolden, MarkZusab, Amiodarone, Zojomars, Anarhistička Maca, Clovermoss, JaAlDo, Creativecreatr, Voidify, Doghouse09, Spazure, Idell, Fehufanga, Triethylborane, Littleb2009, Normal Name, Amazomagisto, and TreeReader: Main-page sections have been updated from the 2020-07-20 dump, and work on Wikipedia:Typo Team/moss/R and Wikipedia:Typo Team/moss/S continues. -- Beland (talk) 02:45, 3 August 2021 (UTC)
New word categorization scheme
editGreetings, everyone, and thanks for your ceaseless work zapping typos! I've had some time recently to do some software upgrades, so you'll notice some changes, especially in the main listings staring with today's update to U. The new "T/" section contains suspected violations of MOS:SLASH. The code is working a bit harder to keep math and legitimate constructs like "moon(s)" out of the complaints list. I've also plugged in some AI that tries to classify the big unsorted pile of typos by language without using a dictionary, so you'll see the new "TE" section based on that which is where the AI thinks the words look like they're trying to be English and it's up to us humans to figure out if they actually are or if they are misspelled or what. Non-English words are being tagged "TF", not included in the main listings yet. That works better for some languages (Greek) than others (Korean), but I'm hoping to post by-language reports soon, maybe for folks who want to add words in a specific language to Wikitionary. If anyone is excited about any particular language, feel free to ping me and I'll see if I can give you something useful. We're also now using Wiktionary as the dictionary against which to find the "T1" (edit distance 1) typos, and the compound-finding algorithm has been improved, so you may notice some changes there as well. If there's anything noticeably worse, feel free to give me a ping, and I'll do some QA. Thanks again! -- Beland (talk) 07:39, 18 October 2021 (UTC)
Shouldn't page titles be ignored in plural form as well?
editThis change of fexpr peppers the article with many uses of {{Not a typo|fexprs}}. Having to tag every such occurrence looks like a maintenance headache to me. "Fexprs" is obviously the plural form of the page title. Would it not make sense for the typo bot/filter to ignore such regular plural forms of page titles? --RainerBlome (talk) 00:26, 26 November 2021 (UTC)
- A way around this is to make fexprs be a redirect to fexpr. Another way that I prefer is to markup fexpr in <code> tags, as it is not a word to read by humans, but something for a machine. Graeme Bartlett (talk) 04:04, 26 November 2021 (UTC)
- Thanks for suggesting the workarounds. I have created the redirect and eliminated the corresponding uses of
{{Not a typo|fexprs}}
. Regarding the suggested workaround of using<code>fexprs</code>
instead, I think it is a misunderstanding that "fexpr" is something for a machine. None of the languages listed on fexpr#See also as supporting fexprs have anfexpr
keyword. I have now described this in some detail in the article. "Fexpr" is a concept, and the article uses the word mostly in the sense of the concept. - Workarounds aside, the question of whether it would be a good idea if the spellchecking bot ignored such plural forms still remains.--RainerBlome (talk) 15:11, 28 November 2021 (UTC)
- @RainerBlome: I've not added code to automatically ignore plural forms for two main reasons: 1. the system has no way to know what the correct plural form is, as just adding "s" or "es" is incorrect for some words, and 2. Wiktionary lists plural forms separately, so for common nouns they would want to know if the entry for the plural form is missing. Capitalized proper nouns are already ignored by the spell checker, so a redirect is only needed when there's a common noun that is ineligible for inclusion in Wiktionary, or a stylized lowercase proper noun. I expect such redirects are actually helpful for searches for these weird words, anyway. There may be other factors I'm missing, but in any case, thanks for your attention to detail on this! -- Beland (talk) 23:21, 3 December 2021 (UTC)
- Thanks for suggesting the workarounds. I have created the redirect and eliminated the corresponding uses of
"convert special characters"
editBibliography of Wikipedia (edit | talk | history | protect | delete | links | watch | logs | views)
@Beland: You'll see from the article history that your recent edit has accidentally undone another editor's good faith contribution. Can you find a way to mark the article or to change the moss reporting process so that this doesn't happen again? -- John of Reading (talk) 07:24, 11 January 2022 (UTC)
- @John of Reading: Ah, thanks for the note! @JPxG: It looks like it was your edits I've been stomping on. Sorry about that; I was trusting that a question would be raised if the same articles appeared again on whatever tool is being used. I couldn't find the maintenance report you were using on-wiki...in some cases I dropped some brackets or made other changes which I was hoping made it clear that these were intentionally part of article text. I was also hopeful that maybe whatever tool could be tuned to be more sensitive, since some of the instances had what to a human are red flags, but I know that's not always easy with things like on-wiki searches that like to give partial results for complex regexes. In the event that it's still necessary to mask "citation needed" constructions, I was going to suggest using {{not a typo}}, which can in the most extreme cases split words in order to intentionally break source code searches. That has the benefit of making the intention clear without a long explanation, avoids the need for complicated syntax, and also won't show up on HTML entity maintenance reports. (That's also actually what I use to prevent intentional HTML entities from showing up on the report, should that become necessary in the course of events.) Anyway, I'm curious what tool you're using, and glad that someone is cleaning up bad markup. If I had a dime for every template I've had to un-subst...I'd still be very grumpy at whoever is over-substing. 😃 -- Beland (talk) 08:28, 11 January 2022 (UTC)
- @Beland: No problem -- I haven't done a run in the last couple weeks anyway, so I'll save my indignation for then. I didn't realize that my typo fix was going to fuck up someone else's typo fix! It does help a little (there are some articles where it seems impossible for this to work). Generally, I use JWB with this search and other bespoke variations I've come up with. I have been meaning to write a software to assist in this, but I've been distracted lately. At any rate, however, I'm sure that if we both posted which regexes/methods we were using, we would be able to figure something out that didn't trigger either of our processes. jp×g 21:25, 11 January 2022 (UTC)
- Basically, it seems to me like the {{not a typo}} template would probably work fine -- all I'm trying to do is keep "citation needed" from showing up in the source text. If typo team scripts are set to ignore that, it seems like that's a natural replacement for the HTML entity thing (I had no idea that was creating work for someone). By the way, I appreciate what you all are doing here, and I think the "moss" term is cute :) jp×g 21:28, 11 January 2022 (UTC)
- Would the best practice for this be something like
{{not a typo|citation needed}}
orcita{{not a typo}}tion needed
?- @JPxG: Normally I'd expect something like
{{not a typo|cita|tion needed}}
orWikipedia:Citation needed</nowiki>
. This does not appear to work for filenames, but I was able to exclude the only file I see with "citation needed" in the name with this modified search. (FTR, it appears regex searches don't support negative lookaheads, and I had trouble getting the Lucene "~" operator to work.) That search does pick up minor lint, like newline inside a tag or double pipes, but I'm happy to go through and fix those instances. I think all the instances where someone is either requesting that citations be added or citation needed tags not be added can be improved without needing "not a typo", so "not a typo" will be needed where "citation needed" is in a legitimate internal link or the title of a work. Which I think means we shouldn't need HTML entities at all. - Also FTR, the code that moss uses to find HTML entities is mostly here in moss_entity_check.py. -- Beland (talk) 02:51, 12 January 2022 (UTC)
- These seem to me like they'd work well -- I'd have no problem with going through and substituting these instead of the previous HTML-entity fix. If those edits have been reverted, I'll just find them again the next time I do a run and I can get to them then. jp×g 04:56, 12 January 2022 (UTC)
- @JPxG: Normally I'd expect something like
Probably OK not OK?
editHi - I was working on Wikipedia:Typo Team/moss/V today and there seemed to be more errors than usual with the "(probably OK:" thing.
- Veracruz moist forests - (not OK: wikt:cnake = snake)
- Veraval - wikt:bhois - unclear; (not OK: wikt:smaj = samaj)
- Verband Forschender Arzneimittelhersteller - (not OK: wikt:facities = facilities)
- Verkehrsverbund Rhein-Ruhr - (not OK: wikt:unchaged = unchanged)
- Verkhneuslonsky District - (not OK: wikt:sproduces = produces)
- Vicente Todolí - (not OK: wikt:grastronomical = likely gastronomical)
- Vigier Guitars - wikt:iMetal - ok; (not OK: wikt:deepm - likely deep)
Not sure if a problem or not but wanted to make you aware. Thank you for creating & updating these lists! Sct72 (talk) 22:14, 11 December 2021 (UTC)
- Your edits look good @Sct72:. Just ignore that "probably OK" because it probably isn't. For iMetal or other brands or names with strange capitalization you could mark these with {{Proper name|}}. Graeme Bartlett (talk) 22:39, 31 December 2021 (UTC)
From Wikipedia talk:Typo Team/moss/D
editIs there anything left to do on this 'D' page? - Jkgree (talk) 17:53, 11 March 2020 (UTC)
- Start looking through the notes? The ones that are genuine words or uses need to be tagged, or have Wiktionary entries created. The other ones need to be researched to see if they are valid or not. JaAlDo (talk) 14:21, 12 March 2020 (UTC) Beland (talk) 01:06, 4 June 2022 (UTC)
Question about search criteria
edit(archived from Wikipedia talk:Typo Team/moss/L) @Beland:, Can you take a look at this edit and describe why the list contained only one instance of a.Kr as a mispelling? Or does it just have to do with edits that were made between the running of the script and mine? Thanks for your insights, Cheers! Elfabet (talk) 12:37, 28 March 2019 (UTC)
- @Elfabet: Yeah, it looks like someone dropped a period from in front of the other instance in time since the dump was snapshotted. Weird that didn't get reported as a second typo, but the NLTK tokenizer may have done something weird with that since words don't normally have both initial and internal punctuation. Good catch finding both of them! I do at least have an offline report of typos with weird punctuation, and I'm the (slow) process of more finely categorizing more of them into things that are OK and things that need to be fixed. -- Beland (talk) 17:24, 28 March 2019 (UTC)
Fictitious Words - Round 2+
edit(archived from Wikipedia talk:Typo Team/moss/L)
e.g. * 3 - List of Wheel of Time characters - wikt:stedding, wikt:stedding, wikt:stedding
(defined on page, not at first instance) @Beland: What're we doing with fictitious words not approrpaite for the dictionary (per WT:FICTION), that don't have a common page to link to which defines them (unlike that last time we asked this kind of question and you cleverly suggested setting up a redirect)? T'anks again, Elfabet (talk) 20:00, 29 March 2019 (UTC)
- @Elfabet: In this case I'd make a redirect to the Ogier entry on that page. If there's no dictionary entry, it seems Wikipedia has to define the word somewhere or not use it at all, because otherwise readers won't know what it means. Alternatives would be to put it in double quote marks or to use {{not a typo}}. -- Beland (talk) 20:16, 29 March 2019 (UTC)