Talk:ISO/IEC 8859-1

Latest comment: 2 months ago by Polluks in topic Seikosha MP-1300AI

windows-1252

edit

Why does windows-1252 redirect to this page? 1252 is NOT the same as iso-8859-1. Redirecting references to windows-1252 to this page (I think) reinforces the mistaken impression that 1252 and 8859-1 are the same (and they are most definitely not). Perhaps this deserves an entire page, but Microsoft's loose labeling of email with (close but not exact) MIME character sets is a BIG problem.

You're free to make the relevant section a (linked) article of its own. The section however tries to make it clear that although both, CP1252 and Latin-1, are supersets of ISO/IEC 8859-1, they differ in 8x and 9x area. The German version takes a little different approach in emphasizing the differences.
The advantage of keeping CP-1252 and MacRoman on the ISO-8859-1 page is that differences can be made more clear. And Windows (and IIS) misreports CP-1252 as ISO-8859-1 by default, so most people falsely assume CP-1252 *is* ISO-8859-1. Latin-1 of course is a valid implementation of ISO 8859-1, as it is nothing but an alias for ISO-8859-1. Jor 13:24, 12 Mar 2004 (UTC)
Latin-1 of course is a valid implementation of ISO 8859-1, as it is nothing but an alias for ISO-8859-1. Wrong. ISO/IEC 8859-1 (the standard) only specifies the characters for the 20-7E and A0-FF byte ranges. The characters for bytes 00-1F and 7F-9F are left undefined. The ISO-8859-1 character map registered with the IANA fills the missing spots with the C0 and C1 control sets (defined elsewhere), thus covering 00-FF. This map's approved aliases are: ISO_8859-1:1987, iso-ir-100, ISO_8859-1, ISO-8859-1 (preferred MIME name), latin1, l1, IBM819, CP819, and csISOLatin1. - mjb 22:07, 12 Mar 2004 (UTC)
So remove the dash. Latin1 *is* a valid alias for ISO-8859-1, which is an encoding based on ISO 8859-1. Jor 22:17, 12 Mar 2004 (UTC)
like it or not in the internet world now ISO-8859-1 is always interpreted as windows-1252. stuff from windows-1252 was being used by editors here with no reported problems all the time before the switch to utf-8. Sometimes you just have to accept that formal standards and reality aren't the same thing. Plugwash 6 July 2005 10:53 (UTC)

"However, the draft HTML 5 specification requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding." The citation does not make any indication of whether or not this statement is true. 194.232.128.102 (talk) 14:47, 11 September 2009 (UTC)Reply

The current HTML5 draft does mention it insection 8.2.2.2 . Of course these things are subject to change at any time. Plugwash (talk) 17:07, 30 January 2012 (UTC)Reply
More than a year on, the current HTML5 draft still does not mention it. I feel it is unlikely to change back. As it stands this sentence has been promoting wrong information to everyone who did not check the references for over a year. I recommend removing or at least rewording this sentence to show that it is no longer true. Greatrsg (talk) 09:20, 3 July 2013 (UTC)Reply

I just wanted to agree with the many other commenters here who also seem to have noticed iso8859-1 is not cp1252. WHATWG cannot define that and they don't. They merely say that a User Agent could render charset iso8859-1 as cp1252. cp1252 is not a strict superset of iso8859-1 nor is iso8859-1 a strict subset of cp1252. They are not equivalent. Also any string encoded as iso8859-1 is validly encoded as well utf-8 (a canonical utf-8 string, in NFC) that is _not_ true of many possible strings encoded with cp1252. Essentially, WHATWG is saying is there exists this pervasive bug, try doing it this way. They are not saying they are not and cannot say: they are the same thing. They are saying if an application says iso8859-1 in this place, perhaps don't entirely trust them. Assume you could assume it cp1252 by mistake (rather than say assume it is utf-8 by mistake). This whole iso8859-1 is cp1252 is totally a wrong read of that standard. -- Garick 108.16.106.24 (talk) 18:17, 10 October 2022 (UTC)Reply

correct quotation marks

edit

In this respect ISO-8859-1 was't worse than a typewriter, or am I mistaken? So, only special characters for typesetting are missing from ISO--8859-1 (also ligatures, like ff, fi, ...). Pjacobi 23:02, 17 Sep 2004 (UTC)

You're not making yourself very clear, please elaborate. -- Ævar Arnfjörð Bjarmason 23:42, 2004 Sep 17 (UTC)

As far as I know, mechanical or electrical typewriters, didn't provide symbols for typesetting either. They are or were lacking the different quotation marks, different length (typographic) hyphens, ligated versions of "ff" etc. Not that it would make much sense on a monospacing machine. Only the specialised input machines for typesettings did have all those.

So, when setting coded character sets for use on computers, I don't think that the lack of those signs can be viewed as not supporting any languages which would use these signs in typesetting. Typesetting was done using specialised markup, shortcuts or automatic conversion (as done by troff).

Even today, Unicode considers some typesetting related issues to be out of scope for coded character sets, a step back from the Adobe approach, which even did put some "ff" ligatures in the "Expert Character Set".

In summary, I consider the remark "missing correct quotation marks" to be slightly misleading.

Pjacobi 12:51, 18 Sep 2004 (UTC)

In German (and I believe other languages as well) it is considered an orthographic error to use something like "speech" instead of „speech“, although »speech«, of which Latin1 is capable, can be acceptable, too. Width of dashes/hyphens and interword/intercharacter spaces is a different story completely. Crissov 18:40, 18 Sep 2004 (UTC)
No it is considered bad typography. In handwriting or as typoscript, it is perfectly acceptable. -- Pjacobi 19:35, 18 Sep 2004 (UTC)
Especially in handwritten German it is not acceptable, ask any German teacher or the Duden. You could compare it to ")foo)" or "(foo(", which no-one would claim correct. It can only be acceptable in technically limited environments. In English "foo" and “foo” are propably similar enough.
The Duden is no authority on glyph shapes. Using the " as glyph shape for both punctation characters 'german begin of direct speak' and 'german end of direct speak' is no othographic error, at least not in Hamburg, Germany. Likewise the Duden doesn't regulate the glyph shapes for small "s" and "t", where there is much variation in German handwriting. Pjacobi 18:16, 19 Sep 2004 (UTC)

Mac-Roman

edit

Is it really best to remove the comparative chart between Mac-Roman and ISO-8859-1? Is it really true that Mac-Roman has no relation to ISO 8859-1 or ISO-8859-1? Also, there appears to be hyphenation differences throughout this page (e.g. MacRoman vs. Mac-Roman, CP1252 vs. CP-1252, etc.). GPHemsley 00:50, Mar 29, 2005 (UTC)

The Macintosh Roman character sets, Mac-Roman and MacRoman, both inherit the ASCII characters, but have nothing else in common with ISO-8859-1. Mac-Roman was introduced with the first Mac in 1984, so I don't think it could possibly be a descendent of ISO Latin. MacRoman changed one character from Mac-Roman (added the Euro). I'll update the confusing text in this article. Michael Z. 2005-03-29 01:30 Z
iirc they do however cover much the same characters which should probablly be mentioned and possiblly detailed somewhere.

lead section

edit

the previous lead section was a one liner far shorter than Wikipedia:Guide_to_writing_better_articles#Lead_section reccomends. Futhermore it didn't even introduce two important variations (iso-8859-1 and windows-1252) which redirect here. I tried to expand it and was reverted by mjb (whose removals i have noe reverted back. mjb claimed it was redundant which is true but Wikipedia:Guide_to_writing_better_articles#Lead_section clearly states "If the article is long (more than one page), the remainder of the opening paragraph should summarize it." a summary is by definition redundant with the more detailed information in the rest of the article. Plugwash 6 July 2005 10:15 (UTC)

But your "summary" was terrible. It introduced concepts and dove into technical details that are not required to achieve a basic understanding of the ISO/IEC 8859-1 standard. I'm not saying the article can't use another sentence in the intro, but if you re-read the article from the beginning, it sounds very sloppy when you immediately start talking about there being no control codes and certain code value ranges being reserved/unassigned — these topics were not even introduced yet and seem completely out of context at that point. I would also disagree with taking too strict an interpretation of the style guide; an intro paragraph does not need to summarize every topic that is mentioned in the article; if it can't introduce a topic without repeating or requiring one to read the whole article, then further simplification of the statements is advisable. — mjb 6 July 2005 11:12 (UTC)

ISO-8859-1 and windows-1252 redirect here and are not just misspellings so they need to be introduced in the summary. If we don't do so then we are misleading users into thinking there are the same thing as ISO 8859-1. If you can think of a way of doing so without mentioning technical details then go for it. Plugwash 6 July 2005 11:14 (UTC)

Ah, see, that's the real issue; there are these redirects, and there is discussion in the article about these oft-confused character maps that are based on the standard. We can offer the reader this information without getting into any details that would require them to have already read the article. I've put one in, but perhaps it could be further improved. — mjb 6 July 2005 11:29 (UTC)

On another topic, do you have an opinion about "maintained by ISO and IEC"? I think it sounds awkward to say "ISO and IEC" rather than "the ISO and the IEC," but it seems equally awkward to put two "the"s in there. Is there a policy or style guide for using definite articles with organizations known by their initials? (The main point I was trying to make in the first sentence was that for a while, the standard was just "ISO 8859-1" and this is what everyone knows it as, but at some point the IEC became involved and any formal citation, especially an encyclopedia entry that has a responsibility not to perpetuate common errors, must say "ISO/IEC 8859-1".) — mjb 6 July 2005 11:34 (UTC)

Character table format

edit

I've been seeing more and more 8-bit character table formats popping up in various articles. There are currently three different styles in use on this page alone. ASCII has two more, and Code page 437 has yet another. I think the template-based approach in the Code page 437 article is a good idea, but I'm not sure it's flexible enough to accommodate the kind of ad-hoc linking we have going on. Also, the auto-scaled 100% table widths are not ideal for all media. Other issues to consider are where to link each character to, and how much info to try to cram into each cell. We are discussing character linking over on Talk:Unicode. Questions to consider are below. — mjb 6 July 2005 12:02 (UTC)

  • Should we standardize the 8-bit character code chart formats?
  • What info should the charts contain?
  • What should the charts look like? Are column/row headings important?
  • Where should character entries link to? (see Talk:Unicode#Nifty_resource.)
  • What's the ideal representation of things like space, soft hyphen, and control codes?
  • What about difference highlighting? Keep?
  • Can we achieve these goals with a template?
my preffered way to handle character linking is to just let the character link to a page titled with itself and then redirect it to the most appropriate place. This allows all references to a character to be updated to point to the same place at once as well as allowing users to enter those characters through the search box and be taken straight to the appropriate place. Plugwash 6 July 2005 22:54 (UTC)
Well, we could use templates, but I'm not sure the approach in Code page 437 is the best. There Template:chset-cell is used, which looks like this:
  • <span style="font-size: large; font-family: serif">&#x{{{1}}};</span><br /><small>{{{1}}}</small>
Those hexdecimal character references are infact converted to UTF-8 by the Wikimedia software.
Furthermore it uses Template:chset-tableformat, Template:chset-left, Template:chset-ctrl and all are put into a table by hand. We could do the same and better with something like Template:8-bit charset, which would look be used something like this:
  • {{8-bit charset|Name=ISO 8859-1|{{C0 control codes}}|{{ASCII character codes}}|7F|{{C1 control codes}}| A0|A1|A2|A3|A4|A5|A6|A7|A8|A9|AA|AB|AC|AD|AE|AF| B0|B1|B2|B3|B4|B5|B6|B7|B8|B9|BA|BB|BC|BD|BE|BF| C0|C1|C2|C3|C4|C5|C6|C7|C8|C9|CA|CB|CC|CD|CE|CF| D0|D1|D2|D3|D4|D5|D6|D7|D8|D9|DA|DB|DC|DD|DE|DF| E0|E1|E2|E3|E4|E5|E6|E7|E8|E9|EA|EB|EC|ED|EE|EF| F0|F1|F2|F3|F4|F5|F6|F7|F8|F9|FA|FB|FC|FD|FE|FF}}
where the templates contain just the hexcodes, e.g. Template:C0 control codes (Control character#Tables):
  • 00|01|02|03|04|05|06|07|08|09|0A|0B|0C|0D|0E|0F| 10|11|12|13|14|15|16|17|18|19|1A|1B|1C|1D|1E|1F
Oh, I just realized that we would have to take special care of control codes (and a few others), because they do not work with links and display, maybe:
  • NUL|SOH|STX|ETX|EOT|ENQ|ACK|BEL|BS|HT|LF|VT|FF|CR|SO|SI| DLE|DC1|DC2|DC3|DC4|NAK|SYN|ETB|CAN|EM|SUB|ESC|FS|GS|RS|US
I think giving alternatives with &124; does not work (well) in templates. Anyhow, the 8-bit charset template would then build a 16×16 table out of the 256+1 arguments it recieved, {{{Name}}} would be put into the caption (|+). How that table should look (hex, dec, oct and/or bin headers, U+ codes [probably by reusing Template:chset-cell]) is open to discussion, but all those codepage and charset tables would look the same. The then unnecessary chset-* templates should be deleted. Christoph Päper 7 July 2005 15:46 (UTC)

History of CP1252

edit

I'm having a hard time finding what year Microsoft introduced code page 1252. I'm particularly interested in MS's support for the curved apostrophe and quotation marks ‘ ’ “ ”. The best I could find so far was "around 1986". — Hippietrail 01:41, 22 July 2005 (UTC)Reply

my guess is it came in with the windows concept of ansi code pages. I don't know how far back that dates though (p.s. i notice that whatever font is used for standard wikipedia text doesn't seem to differentiate between the opening and closing quotes but the font i see in the edit box does). Plugwash 02:09, 22 July 2005 (UTC)Reply
Minor point of interest is that the IANA did not accept Windows-1252 in its charset registry until early 2000, based on a proposal made in December 1999. The other Windows-125x code pages were accepted by the IANA in 1996 after being proposed by someone at Microsoft's Russian branch. — mjb 03:00, 23 December 2005 (UTC)Reply
And windows-874 still isn't in the IANAs list despite being actively used by at least outlook 2000, i pointed this out to the iana-charsets list but they didn't seem to care. Plugwash 08:46, 7 October 2006 (UTC)Reply

Merge request

edit

Someone tagged the article with a merge request. They apparently did not realize that this article forked off of the ISO/IEC 8859-1 article a while ago. Please present a case for the merge or the request will be removed. — mjb 03:00, 23 December 2005 (UTC)Reply

it was part of a mass split done a while back by a fairly new user that created a LOT of small ugly stubs. i've linked all the merge tags to a proposal at the main ISO 8859 talk page. Please comment there if you don't wan't me to go ahead with the mass re-merging. Plugwash 17:16, 12 January 2006 (UTC)Reply
These should NOT be remerged. They are (were) separate for a reason: ISO/IEC 8859-n is in no case identical to ISO-8859-n (when both exist, which is not always the case). These entries should be split again, with appropriate cross-references. Keka (who happens to have been involved with character set standarisation for many years), 2006-04-23.
True in a sense but you could say the same about say jpeg and jfif. One is the formal standard left incomplete by standards body politics. The other is the equivilent real standard in use. Also in most cases the IANA defines ISO_8859-? the same as ISO-8859-? and an underscore is the standard substitute for a space where space can't be used. Plugwash 15:54, 26 April 2006 (UTC)Reply
edit

In the table in the section "Related character maps", at position (-4, 8-) in the table, the table links to the disambiguation page for index (It says "IND"). This shouldn't happen. However, I have abolutely no idea what kind of index it's referring to, so I didn't change it. Could somebody who knows more about this please change it to link to the specific type of index that it refers to? E946 04:59, 6 April 2006 (UTC)Reply

The two pipe symbols

edit

In the character chart, both the character | (value 7C) and the character ¦ (value A6) linked to the article about Pipe_(computing); but as far as I can see, that article only talks about the character | (7C).

I have changed both links to Vertical bar, which I believe gives more relevant information. --Oz1cz 14:57, 10 November 2006 (UTC)Reply

Line Feed / Newline

edit

Why is there no encoding for "line feed / new line" in this standard? How does that work? 83.118.38.37 09:06, 9 February 2007 (UTC)Reply

There are control characters from 00 to 1F designed for functions like this. 2A01:119F:21D:7900:E56A:6E91:A341:16E1 (talk) 18:49, 29 October 2018 (UTC)Reply

ISO 8859-1 vs. UTF-8

edit

I think someone of knowledge in this field should write a section with that name.

OK, the background: I installed a server software distro (Apache2Triad) and everything was working just fine and so I copied some folder with webpages in it I had on another server (XAMPP) and to my shock and appallment this thing was displaying letters with diacritics like all those crappy sites from 1990's, that I've noticed are not even capable of displaying apostrophes on pages in English. And I've noticed a coincidence of pages not being able to display apostrophes (nor any diacritics) and the page having a charset like ISO 123456 or something (instead of UTF-8) in its HEAD section.

And yeah, when I changed the line specifying ISO 8859-1 to UTF-8 (in the httpd.conf file) I got all my diacritics (namely, Latvian) and nothing appears to have been broken.

So, basically, why would anyone need a charset like this when there's UTF-8, what are the inclusion criteria for languages (lol), it's not that I would have any issues with a charset not displaying Latvian diacritics, but there are carons (š) in Czech, for instance, and macrons (ā) in Japanese rōmaji script, and those are like legit languages (not to mention the apostrophes) so, basically, the article seriously lacks some rationale section as to why would anyone use this encoding. 354d 22:05, 26 September 2007 (UTC)Reply

I think the article ISO/IEC 8859 might answer your concerns! Theo 194.222.199.109 17:30, 30 September 2007 (UTC)Reply

i believe in the standard of no standardization —Preceding unsigned comment added by 24.121.199.103 (talk) 20:15, 7 April 2009 (UTC)Reply

ISO-8859-1 table

edit

This page is about ISO/IEC 8859-1. I think the single character table on this page should not represent the ISO-8859-1 printable character and control character set. 92.78.138.134 (talk) 20:27, 3 October 2010 (UTC)Reply

Most of the other pages about character encodings show the control characters. I copied the comment off of one of them. However a few such as ISO-8859-3 show gray for the control characters. In any case there certainly should not be two tables, which is what was here before. Most casual readers would never figure out that the only difference is that the gray cells are switched to control characters and would be looking for differences in the letters!Spitzak (talk) 20:40, 4 October 2010 (UTC)Reply
Yes, all pages should show the control characters of the encoding they are representing. My point is that ISO/IEC 8859-1:1998 has no control characters. So this surplus in information is wrong on the one hand and misleading on the other. It adds to the confusion around ISO/IEC 8859-1:1998, Windows Codepage 1252 and ISO-8859-1. There is no benefit in catering to "casual readers" who never look at those tables anyway. —Preceding unsigned comment added by 88.75.191.146 (talk) 17:25, 5 October 2010 (UTC)Reply
It would seem that the correct approach is to show the actual characters of the character set being presented, and to leave all other code points blank (gray). So for this particular article, no control codes should be shown in the table. But in addition, there should be some mention of (and link to) the ISO control codes that the character set is generally used with. Interested readers can then click through to get more specific details about the "complete" character set code points as it is typically implemented. — Loadmaster (talk) 15:05, 6 October 2010 (UTC)Reply
Link is at the top as C0 and C1 controlsSpitzak (talk) 19:53, 6 October 2010 (UTC)Reply

I don't think it is a good thing to have an article about ISO-8859-1 and not have a table which would show all the characters in it. As I take it, right now I'm supposed to look at the table for ISO 8859-1, then read that ISO-8859-1 has more characters, then head to another article and figure out how that characters fit into this table. Very convenient. HotXRock (talk) 19:45, 12 October 2010 (UTC)Reply

I did that originally but there seems to be a consensus that the description of the ISO character sets should not show the control characters. It is also true that in most use the majority of those values are not interpreted in any way by any definition of the assigned control characters, except for CR, LF, and perhaps TAB. All others tend to render representations or get interpreted as CP1252.Spitzak (talk) 00:15, 13 October 2010 (UTC)Reply
Okay, if this article is not the place for ISO-8859-1, where should it be described then? Windows-1252 has a different set of control characters, and it is definitely less appropriate for describing ISO-8859-1. By the way, Windows-1252 lists differences from ISO-8859-1, but currently ISO-8859-1 doesn't have a corresponding table anywhere on Wikipedia to compare with. Should we create a separate article for ISO-8859-1 then? HotXRock (talk) 10:43, 13 October 2010 (UTC)Reply
I suppose it's a choice between creating a section in this article to discuss ISO-8859-1 and its differences from ISO/IEC 8859-1, or creating an entirely separate article for it. I lean towards the former choice, since it's probably the most expected for users searching for either term and who probably don't know there is a difference between the two. — Loadmaster (talk) 19:46, 13 October 2010 (UTC)Reply
This article "discusses the differences" several times. It says that ISO-8859-1 is this table with the addition of the C0 and C1 controls, and if you follow that link there is (several) tables of the control characters.
I did in fact try to make the table have the control characters but this idea was rejected. I do NOT want to see a return to the redundant two tables, which leads any novices to believe that some of the letters themselves are different and makes the whole subject look more complex than it is.Spitzak (talk) 23:14, 13 October 2010 (UTC)Reply

Hexadecimal character codes

edit

I removed the use of a "0x" prefix on hexadecimal character codes, because it appears that typically in these character set articles the decimal values are predominantly used, and when hexadecimal values are used they are labeled as such (e.g., "hex 80"). The use of the "0x" prefix is not universally recognized as implying hexaecimal, in spite of widespread use of C-like programming languages. In ISO and RFC documents for character sets, in fact, either decimal only is used (e.g., "128"), or a "row/column" notation (e.g., "8/0") is used. — Loadmaster (talk) 04:40, 7 October 2010 (UTC)Reply

Dutch language support

edit

The explanation on Dutch language support is utter nonsense. The very existence of the mentioned IJ/ij as separate symbol is barely even known to Dutch-speaking people. The symbol doesn't even appear on Dutch or Belgian keyboards. As far as I can see, ISO/IEC 8859-1 supports the Dutch language completely. — Preceding unsigned comment added by Nyerguds (talkcontribs) 07:34, 1 July 2011 (UTC)Reply

It was taught to us as a separate symbol in primary school. We called it de lange ij. We all type it as I + J but we also all know that this isn't perfect and that software's ability to convert it into the proper glyph can leave a lot to be desired. — Preceding unsigned comment added by 82.139.87.39 (talk) 23:42, 27 January 2012 (UTC)Reply
The same goes for Czech (and Slovak) ch. It is one character for purposes of sorting for example (between H and I), but it is written by two letters. Ceplm (talk) 14:41, 9 January 2014 (UTC)Reply
You can't even see the difference between ij and ij. You have to select it to see the difference, and you barely save memory if you use it. Maybe there are some fonts in which ij would look better as a separate symbol, but otherwise it's useless. 80.101.107.21 (talk) 16:03, 23 April 2012 (UTC)Reply
FWIW (if anything, now many years later), I could clearly see the difference between ij and ij in your sentence above, even before selecting them. It was a surprise, though, that the former was the ligature (if that's the correct term?), because the visual distance between its parts was longer than in the following ij digraph. I would have assumed the single-character version would bring them closer together -- that this was pretty much the whole purpose of its existence -- but at least the default font my browser displays Wikipedia in has tighter kerning between the separate characters than the combined one. Life is full of little surprises... (And yes, given the discussion in some sections higher up on this page, the irony of my double hyphens for em-dash here is not lost on me.)--CRConrad (talk) 16:12, 11 August 2021 (UTC)Reply
Whether it saves memory or looks better is completely beside the point. There is a world of difference between what we commonly type and what would be "correct" if computers only supported it. Example: Today most people use a single "-" to mean either figure dash, en dash, em dash and so on, because all that ASCII / ISO-8859 supports is "-". That doesn't make it "right", and I seriously doubt a typographic system like TeX would drop "all those useless dashes" any time soon. -- DevSolar (talk) 10:03, 22 June 2012 (UTC)Reply
As a dutch person, I concur for OP. We're only taught to write the 'lange ij' in one space for writing in 'blokletters', and as mentioned, because it sometimes looks better. The latter can also be argued by the ligature fi which looks better than fi when looked close-up in times-new-roman-style fonts, yet I've only seen real typesetting software (like LaTeX) actually converting fi to fi when printing/exporting. The ligature ij is only of historical interest and for vertical writing. The only places I encounter it in daily life is where it's incorporated in logos or markings which are vector-designed (like http://www.jacktummers.nl/wp-content/uploads/200907/lijnbus.jpg or http://www.wingene.be/_STUDIOEMMA_WWW/uploads/Image/DuurzaamheidGisEnMobiliteit/DeLijn_kleur_logo.jpg). Furthermore we have many other digraphs/Diphthongues, (ou, ei, ie, oe, ui, au) and they aren't written with a ligature or in a single space, because, what do you know, it just happens those can't be placed inside each other. --Zom-B (talk) 10:28, 6 August 2013 (UTC)Reply

English (UK and US)

edit

Why do we need to mention UK and US in parentheses in the list of supported languages? Even though some spellings do differ, there are no symbols unique to any of the two. IMHO it could be interesting to note that symbols adopted by smaller communities, such as the character é for say latté, or the character ö for say coöperation (as used by The New Yorker) are supported. elpincha (talk) 21:25, 16 November 2011 (UTC)Reply

Yes there are differences, primarily the '£' symbol, which is not in 7-bit ASCII nor on a US keyboard. It is on a UK keyboard. This is why ASCII is not adequate for UK English usage, you need ISO8859-1, Windows 1252 etc.
There are also some imported words which have retained their accents, at least in British English, for example Café. There are also some other cases, such as the famous Brontë sisters. TiffaF (talk) 16:27, 3 July 2012 (UTC)Reply

Ǿ/ǿ vs Ø/ø

edit

I've been Danish all my life, but I've never heard of any usage of an accented Ø (except, maaaybe, for rockdots). Can anyone give any such example? If not, I propose to remove that note from the page. JanGB (talk) 13:30, 30 May 2016 (UTC)Reply

As another dane, yes that letter is not used (or at least extremely uncommon) in Danish... — Preceding unsigned comment added by 193.106.166.70 (talk) 21:57, 29 January 2017 (UTC)Reply

"Retskrivningsordbogen", the official norm for correct Danish spelling, previously (in its 1996 edition) included this example:
Hunden gǿr dagen lang (The dog barks the whole day)
This example includes an accent to distinguish it from
Hunden gør dagen lang (The dog poops the whole day)
However, accents are always optional in Danish.
--Oz1cz (talk) 10:22, 9 September 2020 (UTC)Reply

Polish language support

edit

It looks like the article doesn't mention Polish language: not in the section on fully supported languages, and not in the section on languages with partial support.165.225.72.120 (talk) 11:18, 15 December 2017 (UTC)Reply

Romanian language support / work-arounds

edit

I don't know Romanian very well so I won't edit it, but a work-around for ș (S-comma) is "sh" in ASCII and a work-around for ț (T-comma) is "ts" in ASCII. The Romanians I know live in an English-speaking country, and this is a very English-influenced work-around, so it may be horrific inside Romania, I don't know. Fluoborate (talk) 11:26, 22 December 2017 (UTC)Reply

Wrong Color in chset-color-intl - F6

edit

Page " ISO/IEC_8859-1 "

Hi, i am sorry for my bad english language skills.

The colors shown for F6 is not the same in the chart and the legend box because that is a typo in Color Code string, i think.

"FFEFAF" (Colored Square) is used when with "Graphic character" and i read it wrong when moving to the Chart through misleading, maybe it's is a failure.


Cya — Preceding unsigned comment added by 93.82.108.221 (talk) 10:16, 6 July 2018 (UTC)Reply

Proposed new format for character set tables

edit
ISO/IEC 8859-1
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
0_
1_
2_ SP ! " # $ % & ' ( ) * + , - . /
3_ 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4_ @ A B C D E F G H I J K L M N O
5_ P Q R S T U V W X Y Z [ \ ] ^ _
6_ ` a b c d e f g h i j k l m n o
7_ p q r s t u v w x y z { | } ~
8_
9_
A_ NBSP ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ SHY ® ¯
B_ ° ± ² ³ ´ µ · ¸ ¹ º » ¼ ½ ¾ ¿
C_ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
D_ Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
E_ à á â ã ä å æ ç è é ê ë ì í î ï
F_ ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
  Unassigned, replaced with C0/C1 control codes in ISO-8859-1
  The codepoints 215 (0xD7) and 247 (0xF7) were still undefined in the first release of ECMA-94 (1985)

This was reverted by User:Matthiaspaul with the following comment: "Good faith, but per Talk:ASCII changed to back to established table formatI: Standard layout (used almost everywhere else as well), color grouping, indication of variances, more and directly readable codes (no tooltips, which don't show at all over here, and would require a mouse anyway))"

IMHO his objections have no merit:

  • Standard layout (used almost everywhere else as well): incorrect, this is based on the tables used for the Unicode Code Blocks, which are used many more times in Wikipedia. Also I intended to change *all* the uses of the current tables, thus this format would becomd the "standard layout".
  • Color grouping: My number one priority was to eliminate this hideous coloring which is also incorrect for all non-ASCII characters! Besides being ugly, they make it impossible to use colors to indicate interesting characters, thus requiring boxes which are ambiguous and only allow one set of interesting characters.
  • More codes: that "code" is the decimal number of the table entry, and people have continuously mangled these tables to change it to the decimal version of the Unicode code point, or vice-versa to mangle the Unicode code point. Removing this, and putting "U+" in front of the code points, should help a lot with removing this confusion. And it is stupid to waste so much space on what really amounts to a hex->decimal conversion table.
  • Directly readable codes: this one is true but it means the Unicode name of the character is not available. Also the lack of the U+ prefix is leading to many people misreading the table and/or doing destructive edits to "fix" it. I also suspect the interest in the Unicode code point number is being vastly over-estimated.

I also made the following in an attempt to address his concerns, though I think preservation of the color and the Unicode code point still make it pretty hideous and would prefer the above design. The main change here is to change letters from green to white, to use actual Unicode categories for the colors, and to eliminate the decimal number, add U+ to the code point, and remove the 100% width:

ASCII (1977/1986)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
0_ NUL
U+0000
SOH
U+0001
STX
U+0002
ETX
U+0003
EOT
U+0004
ENQ
U+0005
ACK
U+0006
BEL
U+0007
BS
U+0008
HT
U+0009
LF
U+000A
VT
U+000B
FF
U+000C
CR
U+000D
SO
U+000E
SI
U+000F
1_ DLE
U+0010
DC1
U+0011
DC2
U+0012
DC3
U+0013
DC4
U+0014
NAK
U+0015
SYN
U+0016
ETB
U+0017
CAN
U+0018
EM
U+0019
SUB
U+001A
ESC
U+001B
FS
U+001C
GS
U+001D
RS
U+001E
US
U+001F
2_ SP
U+0020
!
U+0021
"
U+0022
#
U+0023
$
U+0024
%
U+0025
&
U+0026
'
U+0027
(
U+0028
)
U+0029
*
U+002A
+
U+002B
,
U+002C
-
U+002D
.
U+002E
/
U+002F
3_ 0
U+0030
1
U+0031
2
U+0032
3
U+0033
4
U+0034
5
U+0035
6
U+0036
7
U+0037
8
U+0038
9
U+0039
:
U+003A
;
U+003B
<
U+003C
=
U+003D
>
U+003E
?
U+003F
4_ @
U+0040
A
U+0041
B
U+0042
C
U+0043
D
U+0044
E
U+0045
F
U+0046
G
U+0047
H
U+0048
I
U+0049
J
U+004A
K
U+004B
L
U+004C
M
U+004D
N
U+004E
O
U+004F
5_ P
U+0050
Q
U+0051
R
U+0052
S
U+0053
T
U+0054
U
U+0055
V
U+0056
W
U+0057
X
U+0058
Y
U+0059
Z
U+005A
[
U+005B
\
U+005C
]
U+005D
^
U+005E
_
U+005F
6_ `
U+0060
a
U+0061
b
U+0062
c
U+0063
d
U+0064
e
U+0065
f
U+0066
g
U+0067
h
U+0068
i
U+0069
j
U+006A
k
U+006B
l
U+006C
m
U+006D
n
U+006E
o
U+006F
7_ p
U+0070
q
U+0071
r
U+0072
s
U+0073
t
U+0074
u
U+0075
v
U+0076
w
U+0077
x
U+0078
y
U+0079
z
U+007A
{
U+007B
|
U+007C
}
U+007D
~
U+007E
DEL
U+007F
8_

  Letter   Number   Punctuation   Symbol   Other   undefined

Would very much appreciate any comments from anybody else about this.

Thank you. Spitzak (talk) 17:42, 30 July 2018 (UTC)Reply

Okay it appears that tables inserted into the talk do not get any responses, as I left this version here for 1.5 months with no comments, so assumed it was ok, and started making some rather massive changes to every table, which unfortuinately were not only mechanical as I wanted to fix the many examples where widths or layout were wonky. Changes again got reverted by User:Matthiaspaul with the rather questionable claim that the answer to 16*y+x is "vital" information. In any case it looks like the only way to get discussion is to edit the page itself. So I am editing it again with a series of changes of potential fixes. Please give some constructive criticism here. If "information has been deleted" please exactly say what that information is, perhaps saying what character you are talking about, and putting the information that was removed into the comment. Thank you.Spitzak (talk) 21:18, 23 September 2018 (UTC)Reply

WP:DENY. --Guy Macon (talk) 02:14, 3 November 2018 (UTC)Reply

In my opinion, the colorcode is not that useful, but the Unicode codes can be helpful obviously. Should it be in the character chart, a separate chart or the <!-- --> comments in the page? WP:DENY. --Guy Macon (talk) 02:14, 3 November 2018 (UTC)Reply
Yes i wanted to put the unicode code number, plus the entire Unicode code point name, into the tooltip (try pointing at the cells in the first table but away from the letters themselves, unfortunatley I found no way to make the link work but with arbitrary text in the tooltip). Mattiasipaul complained that this did not work on phones.Spitzak (talk) 19:56, 29 October 2018 (UTC)Reply

WP:DENY. --Guy Macon (talk) 02:14, 3 November 2018 (UTC)Reply

Yes, I was just suggesting that a tooltip was a little better than a comment inside the text which the previous poster said. In both cases you can hit "edit" and see the text. What is your opinion on whether this information should be visible without hitting "edit"? The basic problem is that adding this information bloats the tables so it is much harder to identify glyphs and see patterns. For instance nobody is proposing putting the full unicode name visible in the table, despite that being far more informative than the unicode code point.Spitzak (talk) 16:35, 30 October 2018 (UTC)Reply
Such a "minimal" layout does not work for many character sets due to irregular width of the characters and the fact that the borders between them are difficult or impossible to see. I think dividing lines between the cells are necessary. I am unsure if anything else is needed, even the row and column headers.Spitzak (talk) 16:39, 30 October 2018 (UTC)Reply

I certainly think Spitzak's proposed format looks a lot better than the current garish table. Who needs colour coding to grasp that 4 is a number and Ä is a letter?!? Fricking weird that anyone could be against this. If the "But it's standard!" argument is because the overwrought format is used on other pages, that's an argument for changing it there too. CRConrad (talk) 16:27, 11 August 2021 (UTC)Reply

Thank you. As you may have noticed I succeded in getting rid of the decimal answer to y*16+x in almost all the tables (cp437 I could not convince them to remove it, because it is the Alt code....). I also changed the color used by the majority of the characters to white. The other colors I tried really hard to fade so they were not so garish, but this is gradually getting reverted by people. IMHO I really think this should switch to my first version with nothing except a glyph in the box and so colors can be used for actual interesting information (such as which version the codes appeared in). The tables even with what littel I was able to achieve are still INCREDIBLY ugly.Spitzak (talk) 17:49, 11 August 2021 (UTC)Reply

Well, I see it's far too late for any productive discussion, as all of the work done on the character tables before Jan 2022 has been edited away from all of their articles. At the very least, the template mark-ups could have been left in place, and the templates themselves changed, instead of simply removing them from all of the tables.

As one of the small group of editors who made the changes toward the "garish" table format, let me point out some of the history and rationale behind that table format:

  • The cell colors allow characters of the same type (letter, digit, control, etc.) to be visually discernible as groups in the table itself. Cf. ASCII and EBCDIC, for example, which have very different concepts of groupings for alphabetic characters.
  • The Unicode equivalents are there to show in the table itself what the characters map to, using the now-universal standard encoding (U+dddd). Some characters look alike but have different code points, and it's easier to see what the code is in the table than having to click on the character itself to find out.

I will say that a few points against the format are well-taken, such as eliminating the 100% width formatting. And I like the tooltip idea, even though it's hard to use properly in such a small cell space (there actually being two tooltips in each cell). But on the whole, the "garish" tables were meant to convey useful information about the characters themselves, and the arrangement of the characters within the tables in particular, in a reasonable format. Surely there's a reasonable middle ground somewhere that we can agree on. — Loadmaster (talk) 16:59, 8 May 2022 (UTC)Reply

"Romanised East-Asian languages"

edit

I removed the claim: "It is also commonly used in most standard romanizations of East-Asian languages." It is not true in any obvious reasonable sense: first, "it" refers to the encoding scheme used for data transfer, and the amount of data transfer occurring in Romanised East-Asian languages must be truly negligible. If it meant to refer to the IEC's selection of roman letters, whatever they call it, then it is false, because for example Japanese needs macronned vowels, which are not included. Imaginatorium (talk) 02:15, 31 December 2020 (UTC)Reply

"IsO-8859-1" listed at Redirects for discussion

edit

  The redirect IsO-8859-1 has been listed at redirects for discussion to determine whether its use and function meets the redirect guidelines. Readers of this page are welcome to comment on this redirect at Wikipedia:Redirects for discussion/Log/2023 December 6 § IsO-8859-1 until a consensus is reached. Utopes (talk / cont) 08:34, 6 December 2023 (UTC)Reply

Seikosha MP-1300AI

edit

The manual doesn't mention ECMA-94. - Polluks 20:28, 27 September 2024 (UTC)Reply