Talk:Universal Coded Character Set

Computing High‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
High	This article has been rated as High-importance on the project's importance scale.

Character Set vs. Character Encoding

Latest comment: 8 years ago2 comments2 people in discussion

We must be very clear in the distinction between a character set and character encoding. A character set defines a set of characters...um...how better to explain that: for instance, a character set could be the set containing the first four characters of the English alphabet -> {a,b,c,d}. An encoding is how the characters in a specific character set are actually stored as binary data. UTF-8 uses chunks of 8 bits to cover as much ground of the UCS as possible.

Anyway, my point is: the first sentence of this article previously equated UCS and character encodings. This is a really (relatively) grave error since it could confuse the bejeesus out of people. GodzillaWax 15:37, 20 September 2007 (UTC)Reply

I suppose so. In a computer context, most often we don't separate them. If there is no coding for a character, there is no way to get it into the computer in the first place. Gah4 (talk) 04:31, 6 October 2016 (UTC)Reply

On a deleted sentence

At the end of the section on the differences between Unicode and ISO 10646, I had written the following sentence:

The Firefox browser and the OpenOffice.org suite can handle such characters, on Linux too, supporting Unicode and not just ISO 10646.

That sentence was deleted with the notice "Deleted a non pertinent sentence wich looked like advertisement." I beg to differ with that verdict.

I can concede that the sentence was not worded the best way. However, it is neither nonpertinent nor an advertisement. I wrote it to contrast applications which support Unicode (Mozilla and OpenOffice.org) with applications which support only ISO 10646 (Linux xterm). It is fully germane to this article and section, and any intention of advertising those applications was totally absent from my mind.

I will leave things as they currently are, but I wish it to be known that the charges are incorrect, and I hope some other user with the inclination for it would rewrite it in a wording that does not lend itself to those charges. --Shlomital 21:29, 2005 Feb 20 (UTC)

Article title change

Latest comment: 3 years ago1 comment1 person in discussion

Since Universal Character Set is a proper noun/proper name, as of today I have moved the article from Universal character set to Universal Character Set. — mjb 23:26, 20 Jun 2005 (UTC)

Many years later, the tiled is not 'Universal Character Set', but 'Universal Coded Character Set'. Was this an intentional change? The acronym remains UCS, not UCCS. Why? Inquiring minds want to know. Mcswell (talk) 03:05, 23 April 2021 (UTC)Reply

How about Chinese and Japanese?

Can you add a comment on Chinese and Japanese (and some other languaga, like hieroglyph) which can go not only horizontally bi-directional but also vertical down?

Thanks

Unicode and ISO 10646 distinctions and discussion of the character repetoire

Latest comment: 17 years ago3 comments1 person in discussion

I added a paragraph about the differences between Unicode and ISO 10646. I think the article could use more elaboration on these distinctions and to help drive home the particular innovations of Unicode.

I've also been working on a table that nicely summarizes the characters of the UCS (as of 5.0). My thinking is that this table colud serve as a departure point to link to other articles (or sections of this article) discussing the various scripts and other character blocks in more detail. Wikipedia already has individaul articles covering most of the scripts of UCS (the article could use a small discussion on the UCS use of the term script too). Also, the phonetic blocks could link to articles on the IPA and other relevant articles.

However, I've also been working on drafting portions to discuss the other character blocks: symbols; unified punctuation; unified diacritics, Unihan and CJK supporting characters; compatibility characters; control and formatting characters (such as glyph variant selectors, bidi characters, joiners, non-joiners and language tag characters), surrogates; and private use code points. Compatibility characters is especially a complicated topic that could use some eleaboration. The various symbol blocks are also vary specialized and some discussion of how they're used would be helpful. To me this is the type of information that a general audience would expect from an encylopedia artilce on the UCS (in addition to the topics already covered). It might also help more techincal readers as well. There are so many basic concept surrounding UCS and Unicode that seems to escape implementors of UCS and Unicode supporting text systems.

I'll likely post soemthing here to this duscssion page before posting it to the article. I'm still working on the formatting (I'm not that familiar with Wikimedia’s table markup, so it’s in plain old html table markup) Indexheavy 09:37, 19 April 2007 (UTC)Reply

I now see that some of what I propose is handled in a separate article: Mapping of Unicode characters. Perhaps that article could be summarized in a section of this article. The summary table I'm preparing might fit better in that article. Indexheavy 15:10, 19 April 2007 (UTC)Reply

I added the summary/categorized table of the UCS as I said I would. I added it to the mapping article. Anyone else is welcomed to jump in on these tasks. --Indexheavy 01:20, 25 April 2007 (UTC)Reply

this has nothing to do with the content of the text

Latest comment: 14 years ago1 comment1 person in discussion

tried for a full five minutes to find an actual character map, to look up the Alt-code for the plus/minus sign. Couldn't link. Did get extensive, verbose, and redundant information on the history of, and subtle differences between the various UTF and ISO standards. Fascinating... but should we make these pages a QuikFix InfoBooth, or a "Jolly good read, wot?!". I'm not doing a project, I just needed a detail, and we should diversify into linked media to demonstrate the explanations and classifications given by the parent article. —Preceding unsigned comment added by 124.185.183.250 (talk • contribs) 00:51, 25 May 2007

We should make this page a "Jolly good read, wot?!", not a QuikFix InfoBooth. Wikipedia is not a complete exposition of all possible details. Perhaps this page should link to the Character Names Index page on the unicode.org Web site, but it shouldn't duplicate any of the character tables. Guy Harris (talk) 19:00, 12 November 2010 (UTC)Reply

ISO/IEC 10646 vs. ISO/IEC 646?

Latest comment: 16 years ago4 comments3 people in discussion

It appears that the ISO/IEC standard number 10646 was deliberately chosen to recall ISO/IEC 646, to which the UCS is arguably a successor. Is this encyclopedic enough to bother looking for a good citation? Sw2k7 (talk) 06:43, 8 September 2008 (UTC)Reply

It has the merit of being true. I don't know if we can find a citation for it however. -- Evertype·✆ 16:59, 8 September 2008 (UTC)Reply

If the email interview at [1] was ever published somewhere, I think that would be a citable source. Google search is your friend... --Alvestrand (talk) 05:08, 9 September 2008 (UTC)Reply

Actually, Hugh McGregor Ross is my friend and I remember him telling me this as well... just didn't know that it'd been published anywhere. I'd consider Bob's interview with Hugh to be "citable" however. -- Evertype·✆ 07:10, 9 September 2008 (UTC)Reply

correction ?

Latest comment: 8 years ago1 comment1 person in discussion

the article's comparison of UNICODE to UCS states

Unicode provides: exclusively 16-bit code;

Is this strictly true? Of UTF-8 when capturing, say, ASCII 7-bit text? G. Robert Shiplett 17:29, 5 March 2011 (UTC) — Preceding unsigned comment added by Grshiplett (talk • contribs)

A correction in the way the article report PRC mandated the use of GB18030 in 2000, but in the main article of GB18030 it says the requirement was obligated only after 2005 although Microsoft implemented it for their 2000 release.(Jagan605 (talk) 09:31, 5 June 2016 (UTC))Reply

Where are the UCS abstract codes

Latest comment: 12 years ago1 comment1 person in discussion

"...The UCS contains nearly one hundred thousand abstract characters..."

as mentioned in the first paragraph of the article that there are about UCS's hundred thousand abstract characters so where are they why they are not mentioned in the main article.

Alijamal14 (talk) 23:35, 1 January 2012 (UTC)Reply

U+

Latest comment: 8 years ago2 comments2 people in discussion

Where does the U+ notation come from? Is there a primary source reference for it? Gah4 (talk) 04:32, 6 October 2016 (UTC)Reply

It is defined in the Unicode Standard. See The Unicode Standard, Version 9.0.0 Appendix A (Notational Conventions). BabelStone (talk) 11:49, 6 October 2016 (UTC)Reply

History section is entirely wrong!

Latest comment: 7 years ago1 comment1 person in discussion

See these links: https://pathguy.com/lists/idn/idn.2001/msg02752.html https://pathguy.com/lists/idn/idn.2001/msg02734.html from those who were there at the time. The links say:

Draft ISO 10646 pre-Unicode-merger had a canonical 4-octet encoding form + four "compaction methods": 1, 2 or 3 octets per character or a mixed-byte ("compaction method 5") encoding. That last was not, repeat not, UTF-1! It used control characters (from the C1 range) to announce the particular number of octets used in the sequence that followed.
UTF-1 was devised only after the ISO10646/Unicode merger; it was the one bridge kept with the old 10646's C0/C1 avoidance, by mapping the canonical 4-octet space into multibytes that avoid C0 and C1.
The merger with the then 16-bit (2-octet) Unicode kept only compaction method 2 (as UCS-2) and the canonical 4-octet form (UCS-4). The single-, triple- and mixed-octet compaction forms (1, 3 and 5) were dropped. — Preceding unsigned comment added by 85.65.95.79 (talk) 20:58, 1 November 2017 (UTC)Reply

Codespace

Latest comment: 2 years ago1 comment1 person in discussion

The term codespace is used only once and without any introduction. I presume that simply linking to the Wikipedia Article codespace would be sufficient, but I am unsure. Anyway, I think that someone reading this article would not know what (a) codespace is. --153.96.175.11 (talk) 16:55, 10 March 2022 (UTC)Reply

Merge suggestion

Latest comment: 9 months ago2 comments2 people in discussion

In June, I added a merge suggestion template, linking to an already existing discussion at Talk:Universal Character Set characters#Universal Coded Character Set vs. Universal Character Set, started by Mcswell in 2021. But neither there nor here anyone reacted. Mcswell, should we just go ahead and do it together? ◅ Sebastian 13:46, 17 October 2023 (UTC)Reply

I replied here and opposed the proposed merge. Cheers, CWC 05:13, 17 January 2024 (UTC)Reply

Add topic