Talk:IETF language tag

Latest comment: 9 months ago by DougEwell in topic Complaints about Arabic

Mistakes

edit

The article contains, as it is today, many mistakes. Here is a summary, written by Doug Ewell:

A few examples:
  • "It is written that subtags are separated from each other by a hyphen. This is not true for the given examples in cases where the subtags are empty. The example is en not en-----."
    This statement presupposes that tags can contain "empty subtags," which do not in fact exist. There is only one subtag in the tag "en".
  • "The IETF only derives their subtags from ISO standards, they are therefore not ISO conform."
    The RFC 4646 system does not claim to be "ISO-conformant," and explicitly seeks to mitigate some of the instability of the ISO standards, so this statement is a red herring. Country-code TLDs are also not ISO-conformant since they use ".uk" instead of ".gb", and use other extensions to ISO 3166 such as ".ac".
  • "It also reserves some tag parts that currently do not exist."
    Section 2.2.1, point 4 reads:
    "4. All four-character language subtags are reserved for possible future standardization.
    "At the same time ISO 15924 is a four-character subtag, all ready."
    This is a non sequitur. There is no syntactical or namespace collision between language subtags and script subtags, just as there is none between 2-letter language subtags and region subtags.
  • "Since the ISO 3166-1 alpha-2 can change from time to time there is ambiguity in the use. E.g. CS could refer to Serbia and Montenegro or to Czechoslovakia. Section 2.2.4. point 3 C&D solve this. For ambiguous ISO 3166-1 codes the UN M.49 code shall be used."
    This is oversimplified to the point of inaccuracy. While ISO code elements may be ambiguous for the reason given, RFC 4646 does not assign a UN M.49-based subtag in place of *each* of the "ambiguous codes," only the more recently assigned. Going forward, if ISO 3166/MA reassigned "CS" to yet another country, that country would get a UN-based subtag but the existing "CS" subtag would not be changed.
    I'll be making the necessary factual corrections to the article, but others are of course free to jump in and do it first. I thought it would be good to let the list know that these misconceptions exist and may be widespread, because of the wide use of Wikipedia,

--The above unsigned comment was added by 63.252.121.133 at 15:40 UTC on 19 November 2006.

I've rewritten most of the article. I think my rewrite addresses all your points, but if you see any further problems, just go ahead and edit the article. --Zundark 17:48, 11 March 2007 (UTC)Reply
I've just reformatted this for minimum Mediawiki syntax and use of talk pages (avoiding preformatted paragraphs and broken paragraphs) for easier attribution. I also added a title to this discussion.
Note that Doug Ewell is one of the authors for the revision of RFC4646, now published in RFC 5646 in september 2009. It is also an active member in the Unicode working groups.
I was not an author of either RFC 4646 or RFC 5646. I was the WG editor for RFC 4645 and RFC 5645, which defined the initial Registry contents as specified by 4646 and 5646 respectively. My activity on the Unicode mailing list (which has since ended) is not relevant. There is no conflict of interest in my making corrections to this page, and I plan to make more. --Doug Ewell 19:07, 2 January 2014 (UTC)Reply
When posting discussions, please use a minimum formatting, insert a section title, to avoid the discussion to take two full screens. And don't forget to sign your message, even if you don't have a Wiki account here. verdy_p (talk) 13:11, 1 November 2009 (UTC)Reply

Updating examples

edit

This article uses two letter iso 639-1 tags as examples throughout, though current standards favor three letter iso 639-3 tags. Most examples should be changed to show current best practice. Bcharles (talk) 03:02, 8 March 2011 (UTC)Reply

This article is about BCP 47, which doesn't offer any such choice - you have to use the tag specified in the registry. --Zundark (talk) 08:30, 8 March 2011 (UTC)Reply

Missing Content

edit

I'm curious if anyone knows of a decently comprehensive list somewhere of commonly used IETF language tags. My sense is because there is a theoretically endless amount of possible combinations, no one wants to publish an exact list, and thats why usually examples are only provided. But I do think there are a set of most commonly used tags which would be practical to have a list of for those doing everyday i18n work, and would be a useful addition to this article. Does anyone have a good reference for a list, or also think this would be useful? (note this is obviously an unofficial edit, just convenient to stay logged into my staff acct!) Sbouterse (WMF) (talk) 22:43, 23 September 2011 (UTC)Reply

CLDR has a lot of this sort of thing. Try http://www.unicode.org/cldr/charts/latest/supplemental/index.html . Doug Ewell (talk) 02:46, 15 January 2014 (UTC)Reply

Corrections

edit

I've made some factual corrections to this article, specifically with regard to misuse of well-defined BCP 47 terms like "redundant" and "deprecated" and "standard track." More such changes will be forthcoming. Doug Ewell (talk) 02:48, 15 January 2014 (UTC)Reply

"* up to three optional extended language subtags composed of three letters each, separated by hyphens; (There is currently no extended language subtag registered in the Language Subtag Registry without an equivalent and preferred primary language subtag. This component of language tags is preserved for backwards compatibility and to allow for future parts of ISO 639.)" Are you sure? Cantonese Chinese requires an extlang subtype, represented by "zh-yue"[1] (language-extlang). There's no Cantonese distinction in ISO 639. --Ebarcell (talk) 10:50, 2 September 2014 (UTC)Reply

ISO 639-3 does distinguish between Chinese dialects, including Mandarin, Cantonese, Wu, and many others. Cantonese can be represented by either "yue" or "zh-yue" in BCP 47. Doug Ewell (talk) 13:08, 2 September 2014 (UTC)Reply

References

edit
  1. ^ "Language tags in HTML and XML".
edit

Hello fellow Wikipedians,

I have just modified one external link on IETF language tag. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 03:52, 10 November 2017 (UTC)Reply

Underscore

edit

What about underscore instead of hyphen? — Preceding unsigned comment added by 109.40.67.87 (talk) 09:23, 27 November 2018 (UTC)Reply

BCP 47 language tags consist of one or more subtags separated by hyphens. Some projects that use BCP 47 extend the syntax by allowing (or requiring) underscores instead of hyphens. This is especially common in applications such as CLDR which use language tags as locale identifiers. Doug Ewell (talk) 05:36, 18 December 2018 (UTC)Reply

Simplified lead section

edit

Since July 2018, the lead section of this article had been marked as needing a rewrite; it was indeed very hard to understand, especially for non-technical readers. To fix this (and make the article more accessible), I’ve taken the liberty to simplify and shorten the lead. For illustration, I’ve also added an example with extension U, and one with a variant tag. — Sascha (talk) 08:09, 11 January 2019 (UTC)Reply

Misleading content

edit

The following content is misleading and can be found in the section on ISO 639-5 and ISO 639-2

"In contrast, the classification of individual languages within their macrolanguage is standardized, in both ISO 639-3 and the Language Subtag Registry." It is misleading because ISO 639-3 doesn't make a linguistic judgement on the inclusion or exclusion of languages in a macro-language. Macro-languages are only in the ISO 639-3 standard to form compatibility with ISO 629-2. So, a simple reading of what is there already means that if a new language is added to the ISO 639-3 then it might also get added as a child of a macro-language tag. This is simply not true. So I think that this sentence should be updated to read more clearly Hugh Paterson III (talk) 14:53, 27 July 2020 (UTC)Reply

The statement is correct: the relationship between macrolanguages and their encompassed languages is well defined, unlike the situation with ISO 639-5 collection codes. That statement is unrelated to whether encompassed languages can be added or removed.
That said, ISO 639-3 does, in fact, add new encompassed languages within an existing macrolanguage. For example, in 2022 Las Delicias Zapotec [zcd] was added to 639-3 and the existing macrolanguage Zapotec [zap] was updated to encompass Las Delicias Zapotec. So that is simply true. Doug Ewell (talk) 18:30, 11 January 2024 (UTC)Reply

Complaints about Arabic

edit

The following sentence will soon be updated, clarified, and/or moved to a more appropriate section of the article:

“In its accordance with ISO 639-3, however, it does not provide codes for distinguishing between Arabic-based scripts, and maintains two duplicate codes for Punjabi, as well as a number of dubious or non-existent language distinctions made by its parents standard.”

These are criticisms of the ISO 639-3 and ISO 15924 RA processes, which belong (if anywhere) on the pages for those standards, not on this page. At a minimum, this questionable content does not belong in the introduction, as though it defined the entire nature of BCP 47. Doug Ewell (talk) 18:38, 11 January 2024 (UTC)Reply

A new subsection “Adherence to core standards” has been added, to acknowledge that disagreements with the RA processes exist while clarifying that BCP 47 follows the standards and does not supersede or contradict them. The material in the introduction has accordingly been removed. Doug Ewell (talk) 04:27, 29 January 2024 (UTC)Reply