- bugzilla:1591 not 1691
- This bug is a "duplicate" of bugzilla:65.
- Bug #65 is fixed now, see bugzilla:65#c17. Thanks Brion!
- Bug #563 is fixed now, see bugzilla:563#c11. Thanks Brion!
- reported to pyWikipediaBot-users (see also meta:PyWikipediaBot)
examples
edit- ro:Constantin Brâncusi
- ro:Constantin Brancuşi
- ro:Constantin Brâncuşi
- ro:Constantin Brâncuşi
- w:ro:Constantin Brâncusi
- w:ro:Constantin Brancuşi
- w:ro:Constantin Brâncuşi
- w:ro:Constantin Brâncuşi
- ro:Wikipedia:Caterogizare/Categorie Orase în Slovacia
- ro:Wikipedia:Caterogizare/Categorie Oraşe in Slovacia
- ro:Wikipedia:Caterogizare/Categorie Oraşe în Slovacia
- ro:Wikipedia:Caterogizare/Categorie Oraşe în Slovacia
- w:ro:Wikipedia:Caterogizare/Categorie Orase în Slovacia
- w:ro:Wikipedia:Caterogizare/Categorie Oraşe in Slovacia
- w:ro:Wikipedia:Caterogizare/Categorie Oraşe în Slovacia
- w:ro:Wikipedia:Caterogizare/Categorie Oraşe în Slovacia
11:56, 2005 Feb 28 (UTC)
editexplanations
edit- Here are differnt links. Please look at what the link looks like and what title it targets.
- If you look at this page and compare #3 and #4 you will not see any difference. The difference will show up only if you edit the page.
- The diffrenece is that #3 uses â or î ehile #4 uses for these characters the &#nnnn; encoding too.
- The examples are using three types of characters:
- 7 bit
- 8 bit
- UTF-8 characters
- It is very strange that you can use either 8-bit characters in interlanguage (also InterWiki w:... at en: only) links or UTF-8 characters in the link, see links #1 and #2 (and #5 and #6)
- If you click on link #3 the target will be somthing else.
- Only link #4 works.
- This behaviour is not transparent to the users using a copy and paste method to insert interlanguage links. It is discriminatory to a lot of languages using combined types and should be considered as a critical error. Users will not be aware that common method #3 will fail, that the very technical method #4 is required or that their interlanguage links will be remouved sooner or later. Gangleri | Th | T 17:06, 2005 Feb 25 (UTC)
addtional tests
edit- same examples at
- at another Latin-1 type Wikipedia
sv:Användare:Gangleri/tests/bugzilla:65 - at a UTF-8 type Wikipedia
de:Benutzer:Gangleri/tests/bugzilla:65
- at another Latin-1 type Wikipedia
- #1, #2 and #4 works properly at [[:sv:] but #3 not
- #1 - #4 works properly at [[:de:]
- #5 - #8 will all fail because "w:" is used
see also
edit- nl: Categorie:Stad in Nederland&diff=0&oldid=846275 Compare shows the difference too.
- Mircea_cel_Batran&diff=7149984&oldid=6595353 Compare changed by User:Robbot
test links for pyWikipediaBot-users
edit- Notes:
- in order to document here "what you see as documentation" is coded differently as "what is coded in the links"; the usual method is used:
- &#nnnn; stands for &#nnnn;
- &#xnnnn; stands for &#xnnnn;
- % stands for % for %
an alternative would be % stands for % for %
- all links have been inserted with the copy and paste method
if you make a preview you will see links
- changed to &#nnnn; encoding and
- containing characters in the range 128 - 255
- you should know that they will fail
- there are more "workarounds" to fix the links
- using &#nnnn; encoding for all characters > 127
- using &#xnnnn; encoding for all characters > 127
- using hardcoded %nn for all characters > 127
- a mixture of the methods above
- only #1 is described below
- see also: character encoding at User:Gangleri/tests/Unicode ISO 8859-1/Table of Unicode characters, 128 to 999
links to items from sk:Category:Slovenské mestá
edit- important note:
- Unicode ofers multiple ways to go.
- "opticaly" the following two characters "seems" to be the same
- uppercase letters
- Š Š Š Š
- Š Š Š Š
- probably other more or less advanced Unicode or HTML coding
Š Š (see alanwood.net)
- lowercase letters
- š š š š
- š š š š
- probably other more or less advanced Unicode or HTML coding
š š (see alanwood.net)
- uppercase letters
- because of "the exact match" for accessing titles with MediaWiki only one is allowed:
- "opticaly" the following two characters "seems" to be the same
- sk:Hnúšťa fails coded as [[:sk:Hnúšťa]]
- fails also as [[:sk:Hnúšťa]] coded as [[:sk:Hnúšťa]]
- works as sk:Hnúšťa coded as [[:sk:Hnúšťa]]
- works also for all titles containing only characters A-Z, a-z and "-"
- sk:Category:Slovenské mestá works - only Latin-1
- sk:Category:Banská Bystrica works - only Latin-1
- sk:Category:Bratislava
- sk:Category:Fiľakovo works - UTF-8
- sk:Category:Humenné works - only Latin-1
- sk:Category:Poprad
- sk:Category:Sečovce works - UTF-8
- sk:Category:Žilina works - only Latin-1
- sk:Banská Bystrica works - only Latin-1
- sk:Banská Štiavnica works - only Latin-1
- sk:Bardejov
- sk:Bojnice
- sk:Bratislava
- sk:Brezno
- sk:Brezová pod Bradlom works - only Latin-1
- sk:Bytča works - UTF-8
- sk:Bánovce nad Bebravou works - only Latin-1
- sk:Detva
- sk:Dobšiná works - only Latin-1
- sk:Dolný Kubín works - only Latin-1
- sk:Dubnica nad Váhom works - only Latin-1
- sk:Dudince
- sk:Dunajská Streda works - only Latin-1
- sk:Fiľakovo works - UTF-8
- sk:Galanta
- sk:Gbely
- sk:Gelnica
- sk:Giraltovce
- sk:Handlová works - only Latin-1
- sk:Hanušovce nad Topľou fails coded as [[:sk:Hanušovce nad Topľou]]
- sk:Hlohovec
- sk:Holíč fails coded as [[:sk:Holíč]]
- sk:Hriňová fails coded as [[:sk:Hriňová]]
- sk:Humenné works - only Latin-1
- sk:Hurbanovo
- sk:Ilava
- sk:Jelšava works - only Latin-1
- sk:Kežmarok works - only Latin-1
- sk:Kolárovo works - only Latin-1
- sk:Komárno works - only Latin-1
- sk:Košice works - only Latin-1
- sk:Kremnica
- sk:Krompachy
- sk:Krupina
- sk:Krásno nad Kysucou works - only Latin-1
- sk:Kráľovský Chlmec fails coded as [[:sk:Kráľovský Chlmec]]
- sk:Kysucké Nové Mesto works - only Latin-1
- sk:Leopoldov
- sk:Levice
- sk:Levoča works - UTF-8
- sk:Lipany
- sk:Liptovský Hrádok works - only Latin-1
- sk:Liptovský Mikuláš works - only Latin-1
- sk:Lučenec works - UTF-8
- sk:Malacky
- sk:Martin
- sk:Medzev
- sk:Medzilaborce
- sk:Michalovce
- sk:Modra
- sk:Modrý Kameň fails coded as [[:sk:Modrý Kameň]]
- sk:Moldava nad Bodvou
- sk:Myjava
- sk:Nemšová works - only Latin-1
- sk:Nitra
- sk:Nová Baňa fails coded as [[:sk:Nová Baňa]]
- sk:Nová Dubnica works - only Latin-1
- sk:Nováky works - only Latin-1
- sk:Nové Mesto nad Váhom works - only Latin-1
- sk:Nové Zámky works - only Latin-1
- sk:Námestovo works - only Latin-1
- sk:Partizánske works - only Latin-1
- sk:Pezinok
- sk:Piešťany fails coded as [[:sk:Piešťany]]
- sk:Podolínec works - only Latin-1
- sk:Poltár works - only Latin-1
- sk:Poprad
- sk:Považská Bystrica works - only Latin-1
- sk:Prešov works - only Latin-1
- sk:Prievidza
- sk:Púchov works - only Latin-1
- sk:Rajec
- sk:Rajecké Teplice
- sk:Revúca works - only Latin-1
- sk:Rimavská Sobota
- sk:Rožňava fails coded as [[:sk:Rožňava]]
- sk:Ružomberok works - only Latin-1
- sk:Sabinov
- sk:Senec
- sk:Senica
- sk:Sereď works - UTF-8
- sk:Sečovce works - UTF-8
- sk:Skalica
- sk:Sliač works - UTF-8
- sk:Sládkovičovo fails coded as [[:sk:Sládkovičovo]]
- sk:Snina
- sk:Sobrance
- sk:Spišská Belá works - only Latin-1
- sk:Spišská Nová Ves works - only Latin-1
- sk:Spišská Stará Ves works - only Latin-1
- sk:Spišské Podhradie works - only Latin-1
- sk:Spišské Vlachy works - only Latin-1
- sk:Stará Turá works - only Latin-1
- sk:Stará Ľubovňa fails coded as [[:sk:Stará Ľubovňa]]
- sk:Stropkov
- sk:Strážske works - only Latin-1
- sk:Stupava (Slovensko)
- sk:Svidník works - only Latin-1
- sk:Svit
- sk:Svätý Jur works - only Latin-1
- sk:Tisovec
- sk:Tlmače works - UTF-8
- sk:Topoľčany works - UTF-8
- sk:Tornaľa works - UTF-8
- sk:Trebišov works - only Latin-1
- sk:Trenčianske Teplice works - UTF-8
- sk:Trenčín works - UTF-8
- sk:Trnava
- sk:Trstená works - only Latin-1
- sk:Turzovka
- sk:Turčianske Teplice works - UTF-8
- sk:Tvrdošín works - only Latin-1
- sk:Veľké Kapušany fails coded as [[:sk:Veľké Kapušany]]
- sk:Veľký Krtíš fails coded as [[:sk:Veľký Krtíš]]
- sk:Veľký Meder fails coded as [[:sk:Veľký Meder]]
- sk:Veľký Šariš fails coded as [[:sk:Veľký Šariš]]
- sk:Vranov nad Topľou works - UTF-8
- sk:Vrbové works - only Latin-1
- sk:Vráble works - only Latin-1
- sk:Vrútky works - only Latin-1
- sk:Vysoké Tatry - Mesto works - only Latin-1
- sk:Zlaté Moravce works - only Latin-1
- sk:Zvolen
- sk:Čadca works - UTF-8
- sk:Čierna nad Tisou works - UTF-8
- sk:Šahy works - only Latin-1
- sk:Šamorín works - only Latin-1
- sk:Šaľa fails coded as [[:sk:Šaľa]]
- sk:Šaštín - Stráže works - only Latin-1
- sk:Štúrovo works - only Latin-1
- sk:Šurany works - only Latin-1
- sk:Žarnovica works - only Latin-1
- sk:Želiezovce works - only Latin-1
- sk:Žiar nad Hronom works - only Latin-1
- sk:Žilina works - only Latin-1
- references: Slovak language
things to discuss
edit- it looks to be necessary to have an "alias" translation table for pywikipediabot; hopefully only one for Latin-1 and one for UTF-8 type wikis and not one for every language;
some links to de:
editú
edit- de:Aaiún
- all above works generating http://de.wikipedia.org/wiki/Aai%C3%BAn
- en:Aaiún (nl:Aaiún, sv:Aaiún. etc.)
- translated to forms similar to http://en.wikipedia.org/wiki/Aai%FAn
š
edit- de:Baška (Slowakei)
- coded as [[Baška (Slowakei)]]
- coded as [[Baška (Slowakei)]]
- coded as [[Baška (Slowakei)]]
- coded as [[Ba%C5%A1ka (Slowakei)]]
- coded as [[Ba%9Aka (Slowakei)]]
- [[:de:Ba%9Aka (Slowakei)]]
- all above works generating http://http://de.wikipedia.org/wiki/Ba%C5%A1ka_%28Slowakei%29
- en:Baška (nl:Baška, sv:Baška. etc.)
- translated to forms similar to http://en.wikipedia.org/wiki/Aai%9An
š failures
edit- coded as [[Baška (Slowakei)]]
- [[:de:Baška (Slowakei)]]
- coded as [[Baška (Slowakei)]]
- [[:de:Baška (Slowakei)]]
- fails generating http://de.wikipedia.org/wiki/Ba%C2%9Aka_%28Slowakei%29
- coded as [[Baška (Slowakei)]]
- fails generating http://de.wikipedia.org/wiki/Ba%26scaron%3Bka_%28Slowakei%29
from bugzilla:65#c17
edit- Brion:
- NEVER use š or š for s-caron. Numeric character references always refer to Unicode code points, and U+009A is a reserved control character, *not* s-caron. It might appear to work sometimes due to a fluke and crappy workarounds for compatibility with a Windows bug, but should definitely not be relied upon. Use the real Unicode number, š. The same goes for the other characters in the Windows CP1252 extended range (see ISO 8859-1#Windows-1252 ).
- For the moment the only named character references that will work in links are the ISO 8859-1 ones (s-caron does not appear in ISO 8859-1). Stick with the numbers for now.
- From the example above it can be seen that &<x>acute; is supported by MediaWiki and
- From the example above it can be seen that &<x>scaron; not.
- As you can see scaron are used in the code and titles:
- en: Josef_Hir%26scaron%3Bal
- en: Edvard_Bene%26scaron%3B
By the way: Why Edvard Beneš is redirected to Edvard Benes? OK! If the en: comunity wants this so it's fine for me
- Which of the HTML coding methods (see en:Category:Diacritics, [1]) are supported by MediaWiki and wich not? Wich are corrected by meta:PyWikipediaBot?
- Regards Gangleri | Th | T 05:47, 2005 Feb 26 (UTC)