Talk:ISO/IEC 2022

Latest comment: 10 months ago by HarJIT in topic Encodings and conformance

ISO 2022 vs ISO 646

edit

To represent large character sets, ISO 2022 builds on ISO 646's property that 1 byte can define 94 graphic (printable) characters (in addition to space and 33 control characters).

So the control characters are always available no matter which character table is currently shifted in? --Abdull 20:17, 7 June 2007 (UTC)Reply

I don't have ISO 2022 text, but according to JIS X 0202 (which corresponds to ISO 2022), you can designate control character sets (C0, C1) by escape sequences. See control character sets registrations at http://www.itscj.ipsj.or.jp/ISO-IR/ . ESC is guaranteed to be same code for all control character sets. --Fukumoto 16:42, 8 June 2007 (UTC)Reply

Comparison with other encodings

edit

The comment appears to indicate that ISO-2022 is not useful except with 7-bit displays. Since both GL and GR are mapped, it applies to 8-bit and 7-bit displays (with the latter requiring extra effort on the part of the application developer). Tedickey (talk) 22:15, 25 July 2009 (UTC)Reply

The comment regarding disadvantages is also misleading, since (applying to cut/paste - apparently), it ignores the actual terminal implementations which may pass selections around as UTF-8. Tedickey (talk) 22:18, 25 July 2009 (UTC)Reply

If a text processor needs random access to the character data, it basically has two options:
  • Normalize the text by repeating the current shift code before every character,
  • Convert everything to UTF-8 — but then why not use UTF-8 in the first place?
Both methods are unwieldy enough to be regarded as a disadvantage.
--Yecril (talk) 13:38, 25 September 2009 (UTC)Reply

"Display" is the wrong word; "system" is slightly better. IIRC, the typical PC console driver is 9-bit (512 glyphs can be used at a time, or so) plus 8 bits of colour (4 background, 4 foreground) plus a few bits for bold/underline/blink, plus some more stuff I've forgotten.

  • ISO-2022-JP is presumably useful over a 7-bit transports (traditional SMTP comes to mind). In fact, all the examples listed in "ISO 2022 character sets" appear to be 7-bit.
  • You can always convert generic 8-bit ISO 2022 text (i.e. text that uses GR) into equivalent 7-bit ISO 2022 text by inserting the appropriate control codes and using GL instead. I don't think anyone uses plain ISO 2022, but this may be an advantage. I'm ignoring C1 control codes and DOCS.
  • It's actually easier for a developer to use only the 7-bit range because there's less choice (I have GL mapped to G0 and GR mapped to G1, and now I need a character from G2. What do I do?).
Optimise ...
  • "Actual terminal implementations" — which ones? Why specifically terminals? And no, it's about text processing, not copy/paste (which is simple):
    • A perl script is parsing text. It reads the next byte, which is an "e". But what shift state am I in? Which character is that? How do I represent a "character"? Some bytes need to be accompanied by a shift state, a character number needs to be accompanied by a charset number, and control codes are a right pain...
    • Your mail client is searching for some text ("Hello world!"). But wait — it might be in GL or GR. It might have random shift codes in the middle. Sigh.
  • Any text encoding which has support for arbitrary future extensions (ESC % / in particular) is broken — it's practically impossible to write an implementation that will fail gracefully when, for example, you switch to EBCDIC. And what's "use ESC % @ to return"? Is that the relevant bytes in EBCDIC or ASCII?
ISO 2022/ECMA-35 is defined in terms of bit patterns so it's 0x1B 0x25 0x40 whatever the character set 90.195.73.4 (talk) 21:58, 3 October 2011 (UTC)Reply
  • How nice of them to support "private use F bytes". Suddenly, there's a an unknown blob in your string, and you can't even let the user select inside it because you don't know where the character boundaries are. And how are you supposed to compare two private-use blobs for equality?

The article has many problems, such as the introduction saying that it's a 7-bit encoding (helpfully contradicting the rest of the article!), but not as many problems as ISO 2022. ⇌Elektron 04:13, 29 January 2010 (UTC)Reply

But I agree with the rest of your rant; ISO 2022 is EVIL 90.195.73.4 (talk) 21:58, 3 October 2011 (UTC)Reply

DICOM ISO 2022 variation

edit

Reference 4, "DICOM ISO 2022 variation" is an incorrect url. It points to a simple test email message in a sourceforge project which does not appear to have any relation to DICOM. I've searched to try to find the correct link, with no success. I'd be very interested in the correct target of this link if it could be found. Dlmason (talk) 12:25, 7 April 2012 (UTC)Reply

Rather than that link, this may be useful TEDickey (talk) 13:31, 7 April 2012 (UTC)Reply
Thanks, those examples are helpful, but are largely directed to VRs of type PN. There should (I hope) be a link that talks about other differences or issues in general DICOM encodings compared with ISO 2022 -- for example, the DICOM standard forbids certain control characters and shifts. I'm hoping there's a nice summary somewhere to list all the differences in simpler language than is used in the DICOM standard. Dlmason (talk) 12:31, 8 April 2012 (UTC)Reply

Missing an history section.

edit

Missing an history section. 84.97.14.22 (talk) 19:01, 19 July 2012 (UTC)Reply

removing POV tag with no active discussion per Template:POV

edit

I've removed an old neutrality tag from this page that appears to have no active discussion per the instructions at Template:POV:

This template is not meant to be a permanent resident on any article. Remove this template whenever:
  1. There is consensus on the talkpage or the NPOV Noticeboard that the issue has been resolved
  2. It is not clear what the neutrality issue is, and no satisfactory explanation has been given
  3. In the absence of any discussion, or if the discussion has become dormant.

Since there's no evidence of ongoing discussion, I'm removing the tag for now. If discussion is continuing and I've failed to see it, however, please feel free to restore the template and continue to address the issues. Thanks to everybody working on this one! -- Khazar2 (talk) 04:26, 27 June 2013 (UTC)Reply

edit

Hello fellow Wikipedians,

I have just modified one external link on ISO/IEC 2022. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 02:27, 8 April 2017 (UTC)Reply

UTF-1

edit

The article currently says "UTF-1, the multi-byte Unicode transformation format compatible with ISO/IEC 2022". I don't think it's "compatible" in the way the statement implies because while the standard allows multi-byte encodings, it requires the elements to have constant-length (e.g. all elements have to be 3 bytes) while UTF-1 is variable-length. UTF-1 is registered as a "Coding system different from ISO 2022" (see: https://www.itscj.ipsj.or.jp/itscj_english/iso-ir/ISO-IR.pdf), just like UTF-8 is. Escape sequences are provided for both UTF-8 and UTF-1, but using the "designate other coding system" method. If UTF-8 is not "compatible" then neither is UTF-1. --157.52.11.237 (talk) 14:24, 24 October 2017 (UTC)Reply

I agree this statement is confusing. UTF-1 is certainly not ISO-2022 conformant. "Compatible" means... what, exactly? 204.225.215.56 (talk) 03:27, 29 June 2019 (UTC)Reply

Disadvantages list

edit
 "Because of its escape sequences, it is possible to construct attack byte sequences that round-trip from ISO/IEC 2022 to Unicode and back."

This statement is incredibly scary and also confusing, which is not a good combination. What kinds of attacks? Why do escape sequences make them possible? What other encodings is this in contrast to (UTF-8?)? Why is round-tripping relevant?

There is a link, but there's not any more detail on the destination page, only discussion about popularity of encodings and discussion around measuring if they are in use in a specific piece of software.

Can someone please clarify this part of the article?

01:15, 31 July 2018 (UTC) — Preceding unsigned comment added by 178.197.231.171 (talk)

Introduction

edit

The introduction in the article is far to technical to be an introduction, and misses the things that ought to be in an introduction. Eg is UTF-8 replacing the need for ISO2022? Is it expected that ISO2022 encodings will be phased out? When was the standard written? There are too many standards being referred to without an explanation of what they are (the reader of an article should not have to refer to lots of other articles to understand what they are reading); eg rather than just referring to a standard, use a sentence to describe what it is, followed by the standard's name in parenthesis). FreeFlow99 (talk) 15:42, 27 April 2022 (UTC)Reply

I've now reworked the lede and initial sections; is this an improvement? --HarJIT (talk) 07:51, 28 April 2022 (UTC)Reply

Encodings and conformance

edit

The present Encodings and conformance paragraph, whose current phrasing was introduced here, is kind of messy, because of how it speaks about letters "absent in"[sic] the ISO Basic Latin alphabet. In truth, the ISO Basic Latin alphabet too is one of the short alphabets (writing systems) trivially representable with 256-character encodings. The present paragraph tries to simultaneously talk about how there's this class of small writing systems to be contrasted with larger ones (that really require ISO 2022 or similarly expansive solutions), but then it also somewhat messily tries to convey or imply that these small writing systems already require extended ASCII because their charset exceeds what's in plain US-ASCII. That's trying to fight a two-front battle, which is really not very enlightened – or easy to fix. But I hope someone has a good idea how to fix this, because I'm not sure how. —ReadOnlyAccount (talk) 07:50, 19 December 2023 (UTC)Reply

The intent is that the ISO Basic Latin alphabet can be represented with only ASCII (i.e. only requiring seven bits or 127 codepoints, not eight bits unless you want to use the eighth as a parity bit), while other Latin alphabets (or e.g. Cyrillic alphabets) require either modified ASCII (e.g. DIN 66003, YUSCII, KOI-7), extended ASCII (e.g. Windows-1250, KOI-8) or both (e.g. VNI for DOS). I'm ignoring EBCDIC since it isn't directly relevant to PCs, only to mainframes, and would only confuse the issue. But I take your point that it isn't clear. --HarJIT (talk) 13:17, 19 December 2023 (UTC)Reply
I've reworked the paragraph somewhat now. --HarJIT (talk) 15:01, 19 December 2023 (UTC)Reply