Archive 1Archive 2Archive 3Archive 5

Below is old, archived discussion regarding Wikipedia:Manual of Style (Arabic)

Starting the project

As many of you know, there is no one standard way of transliterating from Arabic to Roman letters. So "Mohammed", "Mohammad", and "Mohamed", and "Muhammad" are all quasi-correct ways of spelling the prophet's name in English language texts. This can be quite a hassle on Wikipedia.

It seems to me that three things are needed.

  1. We need people to provide the names in for Arab figures. For instance, Abu Sayyaf (organization), Jalal al-Din Muhammad Rumi (poet), and Taif (city) are lacking their names written out in Arabic. If I can get some volunteers, I'll try to collect a list of articles that need this.
  2. We need to see if we can agree on some sort of standard spelling for Arabic articles on Wikipedia. For instance the "El" in Mohamed ElBaradei and the "al" in Mohammed Atta al Sayed are spelled the same in Arabic letters, but are translated differently in English. Is there a reason for this? (Both people are Egyptian.) If we can agree on a standard, then we can move articles to standard names (with redirects, of course).
  3. We need to make sure we have proper redirects for alternate spellings for Arabic articles. For instance, if someone looks up Muhammad Atta, this needs to redirect to Mohammed Atta al Sayed. This can be confusing. Doing a little Googling, I found that the most common spellings for Mohammed are "Mohammed", "Mohammad", "Muhammad", and "Mohamed". (This applies to the Prophet as well as other people with this name.) The common spellings for Abdullah are "Abdullah", "Abdallah", and "Abdulla". Etc.

If I can get some volunteers, I'll try to organize a project. Who's in? – Quadell (talk) (sleuth) 15:24, July 14, 2005 (UTC)

  1. Me. – Quadell (talk) (sleuth) 15:24, July 14, 2005 (UTC)
  2. I was going to start making the Wikipedia:Naming conventions (Arabic) policy, so I think I have a lot to contribute and many ideas that could help. 500LL 21:11, July 14, 2005 (UTC)
  3. I have a basic knowledge of Arabic and am familiar with the alphabet. Would like to be involved. - ulayiti (talk) 17:04, 28 July 2005 (UTC)
  4. I'll help. --Yodakii 05:45, 4 September 2005 (UTC)

First decision: should there be consistent spelling in article names?

Should we pick a single spelling for "Mohammed", for instance? There may be one figure named Mohammed whose name is most often spelled "Mohammed", and another person named Mohammed whose name is most commonly spelled "Muhamed". Should we pick a spelling as standard, and make sure all articles use the standard format?

Three options:

  1. We pick a spelling as standard, and make sure all articles use the standard format.
  2. We use whichever spelling is most common for the specific person.
  3. We use a standard spelling for article names unless there's a very strong preference for a non-standard spelling for that particular person.

I'm leaning toward option 3. Anyone else? – Quadell (talk) (sleuth) 17:03, July 20, 2005 (UTC)

Agreed. Though I think it should be a priority to retain the spelling used by the person themselves in case it's known and noticeably differs from the standard (such as Mohamed ElBaradei). - ulayiti (talk) 17:04, 28 July 2005 (UTC)
ulayiti's suggestion sounds good. --Yodakii 05:49, 4 September 2005 (UTC)
I agree. I support the 3rd choice. --LakeHMM 05:00, 27 December 2005 (UTC)

ال (al) & Variants

I'd like to try and get a consensus going on how to render the definite article, especially in names. Should it be al Afghani, al-Afghani, Al-Afghani, Al Afghani, Alafghani...? There are three issues involved:

  1. Include or do not include the definite article?
  2. Dash or no dash?
  3. Capitalize or do not capitalize?

And as far as alphabetization (in both lists and categories), it seems to me that the definite article is superfluous and we should take the first letter after it as the starting point for alphabetization. Otherwise, we get a long, long list of people under "a" for no good reason. Thoughts? --Skoosh 14:08, 24 July 2005 (UTC)

It seems to be semi-standard to use Xxxx al-Yyyy as names (e.g. Jamal al-Fadl.) I like this standard myself. But there are exceptions. Mohamed ElBaradei is almost never written as "Mohamed al-Baradei". Should we do so to keep the standard? And Ramzi Binalshibh is a much more common spelling than "Ramzi bin al-Shibh".
As for alphabetization, I agree that the al- shouldn't count. (Neither, I think, should "bin".) But then there's the question: should Arabic names be alphabetized by first name or by tribe name? Is Jamal al-Fadl listed under J or F? – Quadell (talk) (sleuth) 18:19, July 24, 2005 (UTC)
In Arabic texts, words are often listed including the definite article but alphabetized(abjadized?) ignoring it. In any case long lists starting with "a" should be avoided. As to how to render it, I don't know of any existing standard. --Yodakii 06:25, 4 September 2005 (UTC)

I see a fourth issue:

4. Should the assimilation be shown by the romanization: (Sadam Hussein) al-Tikrit or at-Tikrit? (Ramzi bin) al-Shibh or ash-Shibh?

(For people who might not know Arabic:) Assimilation of the definite article is mandatory in Arabic before "solar" consonants (i.e. dental and liquid consonants) d t dh th n s sh z r l (all of them whether emphatic or not): د ض ت ط ذ ث ن س ص ش ز ظ ر ل

- Tonymec 22:01, 16 October 2005 (UTC)

I think it's fairly standard in most academic transcriptions to use a lower-case 'a' followed by a hyphen, with the obvious exception that an uppercase 'a' should be used where required by English orthographical conventions, i.e., at the beginning of a sentence, the first word in an article title, etc.
As regards assimilation, there seems to be near consensus to show the assimilation, as it'll help non-Arabic speakers actually produce the correct pronunciation. Personally, through simple force of habit, I prefer non-assimilation, but I see no strong reason to oppose the alternative if it's generally supported.
See also the discussion below regarding hamzat al-wasl. Palmiro, logged out due to bug 14:31, 17 October 2005 (UTC)
I think it comes down to what is the purpose of transliteration. I think it should be to give non-Arabic speakers an idea how a word is correctly pronounced in Arabic. That would mean showing assimilation. --Yodakii 15:09, 17 October 2005 (UTC)

I'm trying to make this section a little more organized. I've changed the title to "ال (al) & Variants" because this isn't just about alphabetization (which the previous title mentioned), and because, since we haven't reached an official transliteration, the original Arabic being discussed might as well be mentioned. I'm starting a formal-esque vote for this. I looked for a guideline or standard somewhere in the muddled swamp that is Wikipedia's body of policy, but there isn't really much, so, in a compromise with Wikimedia's belief in the evil of voting, this is more than just a yes or no thing. Discussion is encouraged and votes can be changed. Hopefully this idea is acceptable. --LakeHMM 06:24, 27 December 2005 (UTC)

Poll

Below is a list of options. Each issue will have a bullet, and each option will have a bullet within that.

  • Additions: If your preference has yet to be listed, please add another option under the respective issue, or, if the issue isn't addressed, add it. When listing new options or issues, do not include discussion - that should be placed in an unordered sublist of the option. Please place additions at the end of their respective lists or sub-lists.
  • Voting: Under the option you wish to support, add your vote with any comment you would like to accompany it. Please take a look at some of the argument on the subject before voting. Sign all votes.
  • Response & Discussion: Discussion is encouraged!! If you wish to speak on a certain issue or option, do so in an unordered sub-list where appropriate. If you wish to respond to someone else's specific vote, do so directly under their vote, also in an unordered sub-list. Organize further into unordered sub-lists as necessary, as long as it stays within the issue of ال. Sign all discussion.

--LakeHMM 06:24, 27 December 2005 (UTC)


  • Capitalization: Should the first letter of whatever ال is transcribed to be capitalized?
  • Separation: What form of separation, if any, should there be between ال and the rest of a surname?
    • Space
    • Hyphen or Dash (-)
      1. This makes it clear that it isn't just part of the main word. In Arabic, there is no space after ال, so it shouldn't be a space like in the cases of "de" and "von", but there should be some separation. --LakeHMM 06:24, 27 December 2005 (UTC)
    • None
  • Assimilation: How should the assimiliation of ال into solar consonants be reflected? For example, let us use X as the consonant with which it is assimilated. Don't take into account capitalization or anything besides the present issue.
    • aX
    • aXX
    • a-X
    • a-XX
    • aX-X
      1. The purpose of transliteration is usually to show pronunciation, which is why one uses Latin characters that sound like the Arabic ones, not that look like them. It should be reflected that, although the article remains somewhat separate, it takes the pronuncation of the following word, so the consonant sound becomes part of both. --LakeHMM 06:24, 27 December 2005 (UTC)
    • al-X (the presence or absence of a dash in this case would depend on the previous issue)

Thoughts on transliteration

I've just been redirected to this page from village pump. I hope this doesn't seem like too much of a splurge, but I've been working on a few ideas about Arabic naming conventions already, and these seem to fit well with the discussion above.

I can see the usefulness in recognising four degrees of transcribed writing:

  1. Arabic — the Arabic word written in Arabic script. In some cases we might want to use vowel marks, etc, but, in most cases, thestandard would be not to.
  2. Scientific transliteration — there would be value in having a scientific standard for use throughout Wikipedia. ISO 233 could be the standard, but it might be too hefty for our purposes, and a slightly older standard (ISO/R 233 or DIN 31635) might be more applicable.
  3. Conventional transliteration — there would be value in having a less strict transliteration system that would render Arabic into a handy transcription for English speakers. This transliteration should use as few diacritics as possible.
  4. Customary spelling — we all realise that it would be inappropriate to apply a transliteration scheme on a name that already has a customary spelling in English media. In such cases the custom should be followed, but the transliteration given bracketed in the lead paragraph.

An example of these might be:

  1. القاهرة
  2. ISO 233: ʾˈalqaʾhiraẗ, or, less strictly, al-qāhira
  3. al-Qahira
  4. Cairo

Where there is a customary name, articles should be listed there (a page with the conventional transliteration as its title could redirect to the customary title). Where there is no customary name, or there are different customs in use, the conventional transliteration should be used as the article title. The Arabic should be included in the lead paragraph of the article and accompanied by either one of the transliteration schemes, but not both.

The advantage of agreeing schemes of transliteration first is that we would not be haggling over the transliteration of each name, but applying principles that had been agreed upon beforehand.

As for the definite article, ISO 233 says that it should be ʾˈal and connected directly to the following word without space or hyphen. This is done because the transliteration is trying to reproduce what is written. The conventional transliteration should use a hyphen to separate al- from the following word, and it should assimilate with a following sun letter: ash-shams. This makes it far easier for a English-speaker to read and pronounce.

Personal names should be listed under the personal name, this is the most important convention in Arabic.

I have been working on transliteration systems here. Any thoughts?

--Gareth Hughes 11:52, 1 August 2005 (UTC)

Thank you for your thoughts! Your expertise is very helpful. I have a few questions and comments.
  • I would agree that all articles be named by their customary spelling, if one exists. Otherwise, the article should be named by the conventional transliteration, as you have suggested.
  • I would propose that all articles on Arabic topics list the conventional transliteration and the Arabic script. The scientific translation, in my view, is unnecessary except in cases where the pronunciation is important to the topic.
  • What did you mean by "Personal names should be listed under the personal name"?
  • What did you mean by "The conventional transliteration. . . should assimilate with a following sun letter: ash-shams."
Thanks! – Quadell (talk) (sleuth) 13:19, August 1, 2005 (UTC)
Thanks. I think I typed that 'personal name' thing without thinking how it would read! I meant that Saddam Hussein should be listed under Saddam rather than Hussein. This is standard practice in the Arabic-speaking world. The assimilation of the definite article into the beginning of a word that starts with one of the sun-letters is a feature of spoken Arabic, but not the written language. Many transliteration systems show this assimilation in writing. The example I gave was for the Arabic الشمس. Although it is written al-shams, it always pronounced ash-shams. I think that this feature would help English-speakers pronounce the Arabic in articles more correctly. Such assimilation is used in the DIN German standard, the older ISO/R 233 international standard and by the United Nations Group of Experts on Geographical Names. The current ISO 233 and the Library of Congress transliteration systems insist on al always. --Gareth Hughes 13:44, 1 August 2005 (UTC)
I think that a conventional transliteration system should be used with as few diacritics as possible. However, it should include some diacritics to mark the lengths of vowels, as this is very important in pronunciation. In addition, my old Arabic teacher used this system where the emphatic letters were differentiated from non-emphatic letters by the vowels that followed them: for non-emphatic letters, fathah, kasrah, and dammah would be a, i, and u respectively, but for emphatic letters a, e, and o. I find that useful (even though there's no difference with fathah). What do you think? - ulayiti (talk) 13:53, 1 August 2005 (UTC)
My original proposal was to have two transliteration systems: one strict or scientific, and the other less strict or conventional. However, I found it difficult to decide when a certain one might be used. If an article's lead paragraph uses the less strict version, someone will come along and add all the diacritics we could dream of: it's just simply in the nature of WP contributors. My version of convetional transliteration is quite lossy: it does distinguish between many of the letters, most notably the emphatics. I still think that it's useful to have a strict standard for some instances (for example, a transliteration directly after an Arabic example), and to have a less strict system for other places (for example, in article titles and in the body of the text where no customary name exists). --Gareth Hughes 14:22, 1 August 2005 (UTC)

Articles needing Arabic script

I have created a list of Articles needing Arabic script. If any of you are familiar enough with the Arabic script, and would like to add these to articles, here's a list for you. – Quadell (talk) (sleuth) 14:34, August 1, 2005 (UTC)

What wonderful sleuthing! Keep them coming, and I'll try and add the Arabic! --Gareth Hughes 14:41, 1 August 2005 (UTC)
I have created the template {{Arabic}} and the associated category for easier tagging of these articles. You're welcome to use that (and please edit it if it's not satisfactory). The template can be seen below:

{{Arabic}} - ulayiti (talk) 14:59, 1 August 2005 (UTC)

Thanks! I'll use that instead. – Quadell (talk) (sleuth) 17:46, August 1, 2005 (UTC)

Family name

I'm glad that this project had a boost, but one thing: In the project page you wrote:
People with Arabic names should be alphabetized by their given (first) name. For instance, Taqi al-Din should be listed under "T" in places where people are listed alphabetically.
I don't understand it, usually in lists of people, people are listed alphabetically by their family name, so why do we have to change that? The only problem is with the "al" and "el" before the family name. In my opinion it should be ignored and Taqi al-Din should be listed under "D" not "T".
Concerning categories, I suggest [[Taqi al-Din|Din, Taqi]] instead of [[Taqi al-Din|al-Din, Taqi]], a mistake that makes most of the people in categories concerning Arab people are listed under "A" or "E". CG 16:10, August 1, 2005 (UTC)

Well, I would prefer Taqi ad-Din, but there we go. His full name is تقي الدين محمد بن معروف الشامي السعدي. It is most appropriate to file this name under T for تقي. His family name is certainly not ad-Din. --Gareth Hughes 17:05, 1 August 2005 (UTC)
The Arabic naming system is different from the English one. I think the decision on how to alphabetise the names should be made on a case-by-case basis, depending on the usage of the name of the person in question. But 'al-' and 'el-' should obviously not be used for alphabetisation, and instead be treated like the prefix 'de' for Spanish people. The exception here, I think, is when the name is usually spelt as one word, such as Mohamed ElBaradei (to be listed under 'E'). - ulayiti (talk) 17:27, 1 August 2005 (UTC)
Well then, there are lots of borderline cases. Binladen. Binalshibh. Aljazeera. – Quadell (talk) (sleuth) 17:44, August 1, 2005 (UTC)
Taqi al-Din is not necessarily a good example: despite the fact that it consists of two words, it's one name (like Salah al-Din or Abd al-Rahman). As regards the general principle, I'm not at all sure about listing by first name. In Arabic academic works, it's common to list references by surname. Where would you expect to find Nizar al-Qabbani: under N or under Q?
Also, the extra letters proposed for the transliteration system have a nasty habit of not displaying on computers in use in Arab countries. Is there a way around this? Palmiro 18:01, 2 August 2005 (UTC)

We need a guideline on this. Saddam Hussein al-Tikrit is alphabetized under "S". (Right? Not "T"?) Taqi al-Din is under "T". But how about Bandar bin Sultan bin Abdul Aziz al-Saud? Mohamed Atta al-Sayed? Osama bin Laden? – Quadell (talk) (sleuth) 13:36, August 3, 2005 (UTC)

Region names

Regarding literal translation (i.e. writing the name of an Arabic region/city in English words with meaning rather than just using Latin alphabet of the same name); although "المنطقة الشرقية" means "eastern province", it is extremely rare to see something like (Al Mantaka Al Sharqiya). Instead, "Eastern Province" is predominant and used much more often in English texts (more than 99%) AND in more credible sources, e.g. government agencies. In this case, which one should be used?

Moreover, it is extremely rare to see a combination such as "Ash Sharqiyah Province" which is currently used as a title for the article. It is a half-translation name for the Arabic one, making it very awkward. Not to mention that the spelling varies greatly, unlike Riyadh, Jeddah, or Dhahran for example.

See the debate in Talk:Ash Sharqiyah Province

Also note that I'm not asking to rename the Riyadh article to "Gardens" or Abqaiq to "Little Bedbug".

In short, all names of Arabic cities, regions, and -sometimes- people should be standarized to avoid any confusion. -- Eagleamn 06:55, August 2, 2005 (UTC)

Thanks, Eagle. I think you have come up with an increadibly important point. I can see that the article you give as an example would be better served as being:

Eastern Province (Arabic: المنطقة الشرقية al-Manṭaqä aš-Šarqiyyä) is a province of...

However, I think there are some Arabic names that can be easily translated into English that are best left in the original:

Bab Sharqi (Arabic: الباب الشرقي al-Bāb aš-Šarqī, the Eastern Gate) is one of the gates of the Old City of Damascus...

I think the difference is what we were discussing about a customary name above. The first example given was that Cairo should be called such, even though its official name is al-Qahira (I'm not sure how this sits alongside Bombay asking the world to call it Mumbai). We could say that Eastern Province is the customary English name, but that Eastern Gate is not the customary name (even the guide books call it Bab Sharqi).
--Gareth Hughes 11:10, 2 August 2005 (UTC)

Definite article in article titles

A lot of English Wikipedia articles with titles from Arabic words retain the definite article. In some cases, this is because the the title is better known in English with the article, for example al-Qaeda and al Jazeera. However, other articles retain the article for no particular reason, for example ash Sharqiyah Province. I would think that the Arabic Wikipedia could tell us something here, just look at ar:عراق to see that the article is not used in a title. --Gareth Hughes 12:14, 2 August 2005 (UTC)

Good point. Feel free to move any articles to more appropriate names. – Quadell (talk) (sleuth) 12:32, August 2, 2005 (UTC)
Note that ar:عراق now redirects to ar:العراق. The policy of the U.S. Board on Geographic Names is to transliterate the article as part of the name if it is ordinarily used in Arabic as part of the name. The exception is if there is a conventional English-language equivalent without the article, thus "Riyadh" and not "Ar Riyāḑ". They are pretty conservative in using conventional names however; thus Basra is still "Al Başrah" ("Al Basrah"). BGN names are the ones used by the CIA in their public-domain maps and World Factbook articles which are of course heavily utilized here. They often appear without diacritics (signs above or below the letter) which is what I would recommend using here in article titles. --Cam 22:52, 15 January 2006 (UTC)

Saudi royal family

As an example, let's look at Bandar bin Sultan, an important figure. What should the article be titled?

I would recommend that we use "XXX bin YYY al-Saud" for all members of the Saudi royal family. What do you think?

This also brings up two related questions.

  1. When is it appropriate to use "bin", and when is it appropriate to use "ibn"?
  2. Is it best to write "Abdul Aziz" or "Abdulaziz" or "Abdelaziz"?

Quadell (talk) (sleuth) 19:40, August 2, 2005 (UTC)

Should Misha'al of Saudi Arabia be "Misha'al bint Fahd bin Mohammed bin Abdul Aziz al Saud"? – Quadell (talk) (sleuth) 22:42, August 3, 2005 (UTC)

    • I whole-heartedly agree. About the `ain, we haven't been, but we should. The ain is the back-tick, right? – Quadell (talk) (sleuth) 11:41, August 4, 2005 (UTC)

How about Mohamed Atta al Sayed? What is the "Atta"? Is it a family name? A second part of the given name? Or what? – Quadell (talk) (sleuth) 12:32, August 4, 2005 (UTC)

I think it's the second part of a given name. Many given names are made up of "Muhammad" followed by another name. In this case I think a dash between the names is appropriate (Mohamed-Atta). --Yodakii 06:40, 4 September 2005 (UTC)

Standard in using Arabic script

Moved here, from Template talk:Arabic.

I think it would be better if we make a standard about adding arabic script in articles.

I suggest this example:
Cairo (Arabic: القاهرة; transliterated: al-Qāhirah)

  • romanized could be used instead of transliterated.
  • The translation and transliteration are between parenthesis and in bold
  • Translation doesn't have to be included if the English and Arabic pronunciation are the same. (e.g.: Osama bin Laden)

Please comment on this suggestion, in order to reach a consensus that could be applicable on all Arabic-related article. CG 11:15, August 4, 2005 (UTC)

I think that's a great idea. – Quadell (talk) (sleuth) 12:30, August 4, 2005 (UTC)
Thank you, but about the transliterated/romanized question, should we stick to one world (I'm for transliterated) or use the both equally? And something else: every consensus that we reach through this talk page should be put in main project page. CG 14:27, August 4, 2005 (UTC)
I'm a fan of "transliterated", rather than "Romanized". And should we link "transliterated" to Arabic transliteration instead of transliteration? – Quadell (talk) (sleuth) 14:50, August 4, 2005 (UTC)
"Translation doesn't have to be included" - should this read transliteration? If so, bear in mind (a) the value of consistency and (b) as English spelling isn't phonetic, even a conventional spelling in English may be prnounced differently by different readers. Palmiro 15:25, 4 August 2005 (UTC)

I just found out that the article Arabic transliteration is currently under construction. I think it would be unsefull for our policy. CG 14:36, August 4, 2005 (UTC)

Good find. – Quadell (talk) (sleuth) 14:50, August 4, 2005 (UTC)
Can we have a standard for how to deal with personal names? i.e., in biographical articles, should the Arabic and transliteration come before the dates of birth and death? There seems to be a bug that switches the order of everything around when I try to input Arabic letters before dates of birth, cf [1] Palmiro 11:32, 16 August 2005 (UTC)

Main article re-write

I just re-wrote the main article page to be consistent and thorough, and to incorporate what we have agreed on here. Please correct any errors you find in what I've written. Also, I may have erred in some of my transliterations. For the record, according to the standard transliteration described on the main article, how would you transliterate. . .

  • Bandar bin Sultan al-Sa'ud?
  • Saddam Hussein al-Tikrit?
  • Turki bin Faisal al-Sa'ud?
  • Waleed bin Ahmed al-Shehri?
  • Taqi ad-Din?
  • al-Qahira?
  • Muhammad?

Thanks, – Quadell (talk) (sleuth) 14:56, August 4, 2005 (UTC) Thanks

The transliteration proposal

I have a few issues with the standard transliteration proposal that's been put on the project page:

  • First of all, the letter ج (jīm) is never pronounced as a g. Egyptian Arabic and some other dialects have a 'g' sound, but it is not written with jīm.
  • ة (tā' marbūtah) should be transliterated 'ah' when it is at the end of a sentence.
    • (Do you really mean "end of a sentence", or do you mean "end of a word"? – Quadell (talk) (sleuth) 16:09, August 4, 2005 (UTC))
  • I think transliterating the long vowels as 'iyy' and 'uww' is a good idea, but it doesn't always work. That's why I propose using ā, ī, and ū for the long vowels in article text. They can't be included in the article names, but they are IMHO the bare minimum of diacritics that should be used. If people don't like to use the carets, then how about using circumflexes instead: â, î, û? - ulayiti (talk) 15:44, 4 August 2005 (UTC)

Those seem acceptable to me, but I really don't know enough about Arabic to comment intelligently. I've seen ā and ī more often than â or î, so I say we go with the former. By "the long a", are you referring to َى or َي? – Quadell (talk) (sleuth) 16:02, August 4, 2005 (UTC)

I also don't have much knowledge of Arabic, but I think we need to be a little cautious about creating our own transliteration system. Not that that isn't a valid choice...I just think it's worth considering if an already existing system will do the job.

It's also worthwhile being clear about the purpose of the transliteration. It seems to me that it should be there to allow readers who aren't familiar with Arabic script to get a fairly accurate idea of the correct pronunciation.

It follows then that it should do things like show the assimilation of ل (as suggested by Gareth Hughes, i.e. ash-shams vs. al-shams) and distinguish between emphatic and non-emphatic letters.

It also seems logical to me that since this is the English Wikipedia we should use a system that leverages English spelling, i.e. one that uses sh and th. This seems to point to either UNGEGN or ALA-LC if an existing standard is to be chosen. From there it's a matter of whether you prefer dots or cedillas under your letters.

Moilleadóir 16:28, 4 August 2005 (UTC)

I'm not arguing for us to create an entirely new transliteration system, but we can't use any 'scientific' system either due to the technical difficulties. However, if the purpose is to allow readers who aren't familiar with Arabic to get a general idea of the pronunciation, then even trying to show the emphatic letters in some way is a waste of time. Having a dot or a cedilla below a letter doesn't exactly tell you how it's pronounced if you don't know what it means (and most people don't).
The 'scientific' transliteration can be shown separately within each article, but the article title must, in my opinion, give a general impression of the pronunciation, while containing as few confusing diacritics as possible. For that purpose, I feel the difference between the emphatics and their non-emphatic counterparts can safely be ignored. However, indicating whether the vowels are long or short is imperative. - ulayiti (talk) 16:59, 4 August 2005 (UTC)
I suggest the article Arabic transliteration as a diplay for many transliteration systems that could help this project. CG 19:24, August 4, 2005 (UTC)
That page doesn't really have any useful systems for the 'simple' version of the transliteration - it only lists the 'scientific' ways to transliterate Arabic (I'm partial to the UNGEGN system myself though). - ulayiti (talk) 20:55, 4 August 2005 (UTC)

Sorry ulayiti, I probably wasn't very clear. I assumed in this section we were talking about the 'scientific' transliteration not article titles. There are no significant technical details that prevent the use of any of the transciption systems for text within an article.

My preference for that is tending towards ALA-LC (rather than UNGEGN) with the exception of transcribing the assimilation of al. One reason for this is that due to peculiarities of usage in (I think) Romanian, T and D WITH CEDILLA ŢţḐḑ is often rendered as T/D WITH COMMA BELOW ȚțD̦d̦ which makes it look odd next to the other UNGEGN characters with cedilla ḩşḑţz̧.

Another (minor) technical issue for transliteration is whether to use precomposed forms of the accented characters or combining diacritic marks (not sure if there's a Wikipedia policy on this). This probably impossible to enforce anyway, but would allow users without the right fonts to see an unaccented version of the text, e.g. ṣād might render as s?a?d or s?a?d. Incidentally, there is no precomposed form for the Z WITH CEDILLA (z̧) used in UNGEGN.

As far as article titles go, I'm more inclined to go with the simplest form possible (where there isn't already an accepted standard form) including not representing long vowels. My reason for this is the majority of English speakers aren't likely to be using a keyboard that would allow them to input a circumflex, let alone a macron, so making article titles show vowel length (although I can understand why you would want to) would mean every article would require redirects. In the interest of not making life more complicated than it needs to be, I think we should give up on showing vowel length in article titles.

I'm inclined to suggest sticking with just one transliteration rather than having a 'scientific' one and a simple one because this may just confuse things more. I'd prefer to just adopt an existing transliteration (with minor changes like assimilation of al).

The major issue with any system would be how to maintain consistency. Would it be practical to insert a template at the top of every Arabic themed article linking back to this project?

Moilleadóir 03:51, 5 August 2005 (UTC)

Well, I have no problem with ALA-LC either. But if we're not going to have a 'scientific' transliteration at all as you suggest, then it will be difficult for readers to figure out the pronunciation. I like the system proposed by User:Cedar-Guardian in the 'Standard in using Arabic script' discussion above, where there would be both a simple and a scientific transliteration.
if we're not going to have a 'scientific' transliteration at all as you suggest
This is not what I suggested. —Moilleadóir 03:09, 12 August 2005 (UTC)
However, if we choose to use only one transliteration, I think the macron-less keyboards pose no problem, since macrons can't be included in article titles anyway. The article titles should be completely free of diacritics, and the macrons should be included when the word is first mentioned in the article. I've seen this done with Japan-related articles with the {{wrongtitle}} template attached. - ulayiti (talk) 08:18, 5 August 2005 (UTC)

It seems to me that transcribing hamza and `ayn the same is a rather bad idea. Also, tā marbūţah should be transcribed "at" ONLY in a construct or iDāfa phrase. It can be transcribed as "a" or "ah" elsewhere, but one or the other of these two alternatives should be consistently used. -- AnonMoos 13:56, 16 August 2005 (UTC)

Transliteration question

I tried using the standard transliteration on "Abdul-Aziz al-Saud", the founder of Saudi Arabia. His name is, in Arabic: عبدالعزيز آل سعود

By applying the direct substitutions, I get: `bdl`zyz 'al s`wd. I kinda doubt that should be the article title. It looks like if one doesn't know Arabic, one isn't qualified to transliterate, which is unfortunate. – Quadell (talk) (sleuth) 20:28, August 4, 2005 (UTC)

That's because short vowels aren't usually written down at all Arabic. By adding the vowels you get `Abd al-`Azyz 'al-Sa`wd (which, according to the transliteration system I'm proposing, would be 'Abd al-'Azīz as-Sa'ūd). It's usually all right to assume that the vowels have been transliterated correctly in the first place, since there's only three vowels - a, i, and u (though in some transliteration systems a or i may be transliterated as e, or u as o). But sometimes you need to know a bit of the language itself. (It's a fascinating language though, I seriously suggest you study it if you're interested.) :) - ulayiti (talk) 20:50, 4 August 2005 (UTC)
You said 'Abd al-'Azīz as-Sa'ūd, with three apostrophes. Did you mean `Abd al-`Azīz as-Sa`ūd, with back-ticks instead? Or do you think `ayn and hamza should both be written as an apostrophe? (I don't mean to nit-pick, but I'm a beginner in this, and easily confused.) – Quadell (talk) (sleuth) 22:19, August 4, 2005 (UTC)
I see no reason to separate 'ain and hamzah in a 'simple' transliteration as it is, since they are often pronounced nearly identically (and certainly sound similar to anyone who's not a native Arabic speaker). - ulayiti (talk) 08:20, 5 August 2005 (UTC)
They are pronounced very differently indeed! I think a non-native speaker is more likely to mix up a hamza with a kaf or a 'ain with a long 'a' than a hamza with a 'ain. If we're going to the trouble of distinguishing less obviously-different (top foreign ears) sounds such as sin and sad, then we should certainly sort out such distinct ones as 'ain and hamza. Palmiro 22:21, 5 August 2005 (UTC)
I agree. If the we're going to show pronunciation lets just do that rather than making arbitrary decisions about what users might or might not find important.
Moilleadóir 02:45, 7 August 2005 (UTC)

I completely agree with the idea that we should distinguish all the letters. I think it's a really good idea in principle, but it's very very unpractical to do that in article titles. I think the actual technical restrictions that prevented putting special characters in article titles have been lifted by now, but we can't expect people searching for a certain Arabic article to type in characters with macrons or dots below the letters just to find the right article. I'm all for a scientific transliteration within the article, but not in the article title. - ulayiti (talk) 18:45, 7 August 2005 (UTC)

Another question: If an apostrophe is used for ع ('ain) , would it be a problem to use it in atticle titles? --Yodakii 07:20, 4 September 2005 (UTC)

The purpose of transliteration

I'd like to take a step back and look at the purpose for transliterating Arabic here in the first place. We could, as some have suggested, translate every Arabic character into a unique Roman-based-character, but I think that defeats the purpose.

In the English Wikipedia, the goal of every article is to provide information to a reader, fluent in the English language. For most readers, the text "ʾˈalqaʾhiraẗ" is not as useful as "al-qāhira". The former is more precise, and may be pleasing to a lover of Arabic, but the latter imparts more direct pronunciation knowlege to a person unfamiliar with the transliteration scheme. The only people who would know the difference between d and ḏ (ﺩ and ﺫ), or between ` and ' (`ayn and hamza) probably can already read the Arabic script and don't need a transliteration at all. This shouldn't just be an academic exercise to put Arabic letters into Roman letters; it should be a way of giving useful information to English speakers unfamiliar with Arabic transliteration systems.

To stay true to this purpose, we should only use characters that an English reader would be expected to understand. We can't expect a reader to know the difference between d and ḏ, so both ﺩ and ﺫ should be transliterated as "d". We can't expect an English speaker to be able to pronounce `ayn (a "laryngeal voiced fricative" not used in any Indo-European language) or to usefully distinguish that sound from other sounds, so we should either use ' or leave `ayn untranslated. But an English speaker will know what ā means, and would likely guess that an apostrophe signifies a gap in sound.

So that's why I think we should keep it simple, with as few diacriticals as possible. There are still details to work out (What do we do for `ayn? How do we show the ta' marbuta?), but I hope we can agree on the basic goals for a transliteration scheme. – Quadell (talk) (sleuth) 21:37, August 8, 2005 (UTC)

I agree completely. The point should definitely be to educate rather than to be excessively precise about things that nobody will understand anyway. - ulayiti (talk) 23:54, 8 August 2005 (UTC)
I disagree completely. For a start you have deliberately chosen the most extreme example and did not include the unicode template. The ALA-LC transcription would be almost identical to your 'simple' one - al-qāhirah - and is in fact the one used in the Cairo article. It doesn't help your credibility to resort to this kind of nonsense.
I didn't pick and choose the example. I used the example that someone else used earlier on this page. Let's please keep this discourse civil, and not resort to personal accusations. – Quadell (talk) (sleuth) 13:06, August 9, 2005 (UTC)
I apologize for suggesting this was a deliberate choice. Do you agree that it is an extreme case, given that ISO 233 uses more diacritics than any other system? —Moilleadóir 14:43, 9 August 2005 (UTC)
Perhaps. I guess my point was just that ISO 233 is a poor choice. – Quadell (talk) (sleuth) 15:44, August 9, 2005 (UTC)
The right question to ask is whether a particular transcription makes a word any less intelligible to the average reader, not whether it is suited only to the needs of the lowest common denominator. For instance, whether a user knows the significance of the dot in ḍ is irrelevant because even if they don't they aren't going to suddenly decide to pronounce it like a k.
Following the principle of your argument would also mean that the letter Q should be dispensed with as the average English speaker would not understand what sound it referred to.
You are creating a false dichotomy here and, in my opinion, creating problems for the future. Decisions made on the basis of your personal opinion of what might or might not be important sound distinctions, or on judgments of the understanding of the 'average' reader will be open to challenge in the future. Following a standard (or even a rigorously defined variation of a standard) is a much safer course.
Moilleadóir 09:16, 9 August 2005 (UTC)
I'm all for following a rigorously defined variation of a standard. I think we may agree more than you think. – Quadell (talk) (sleuth) 13:06, August 9, 2005 (UTC)
Wikipedia is an encyclopaedia, not a dictionary. To start with, I've never seen any encyclopaedia write Arabic words with such precise transliterations as you're suggesting. The news media doesn't write them like that. The people who have Arabic names themselves definitely don't. The precise transliterations are only used in teaching Arabic, and this is not what Wikipedia does, or should do. We have thus no reason to provide those transliterations, and asking the community to do so is a case of instruction creep if there ever was one.
It's true that dots below the letters won't change the entire pronunciation for people who don't know Arabic, but they will certainly confuse them enough to not even try to figure out the correct pronunciation. If they see a lot of weird letters, they'll just think it's difficult to pronounce (which it isn't) and leave it. The average English speaker knows how Q is pronounced. They realise that an apostrophe means a glottal stop. And as Quadell already said, people who know Arabic won't need the transliteration anyway as we can read the pronunciation straight from the Arabic script. So why should we intentionally make it more difficult for those who don't?
It's not about any 'personal opinion', it's about a logical view of what is practical and feasible. And by the way, al-qāhirah is exactly the way it would be rendered in the 'simple' transliteration we're proposing. - ulayiti (talk) 11:32, 9 August 2005 (UTC)
    1. My reply was to Quadell
    2. If accurate representations of Arabic are only to be used for teaching Arabic, then why include Arabic script at all?
    3. Creating your own standard as opposed to following an established one (which is what you're suggesting) could also be considered instruction creep
    4. English speakers know the Q is pronounced /kw/ not /q/
    5. Saying that English speakers understand q but are confused by ḍ is still just an opinion
    6. If putting a dot below ḍ will confuse people enough to not even try to figure out the correct pronunciation then writing it d will just ensure that it is impossible for them to figure out the correct pronunciation
    7. The world is not strictly divided into those with no knowledge or interest in Arabic and those who can read Arabic script. Why not cater for those in between rather than 'intentionally make it more difficult for' them?
    8. If we're to use the news media as the standard for Wikipedia articles then I say 'abandon all hope!' The other night on the ABC the text Abu Bakr appeared on the screen, but what was said was A-boo B'kaa (' represents a schwa).
Moilleadóir 14:46, 9 August 2005 (UTC)

I propose we use the ISO 233-2 standard, which is a simplified version of the ISO 233 standard. (The ISO 233-2 standard is basically the same as the ISO 233 standard, but without as many diacrics.)

Arabic ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ع غ ف ق ك ل م ن ه ة و ي
ISO 233 ʾ b t ǧ d r z s š ʿ ġ f q k l m n h w y
ISO 233-2 ' b t th j h kh d dh r z s sh s d t ' gh f q k l m n h w y

It's a standard, it's simple, and it's (almost) usable without diacritics. It this an acceptable solution? – Quadell (talk) (sleuth) 15:44, August 9, 2005 (UTC)

Do you have a reference/link for ISO 233-2 so it can be added to Arabic transliteration? As far as utility goes, I think the fact that it includes ẗ means that it isn't really all that simple. —Moilleadóir 03:23, 10 August 2005 (UTC)
It doesn't appear to be on the web, but my library has a copy. It seems to just be titled ISO 233-2 (1993). ISO also calls it a simplified version of ISO 233 (1984). – Quadell (talk) (sleuth) 11:17, August 10, 2005 (UTC)
Could you add it then? —Moilleadóir 03:12, 12 August 2005 (UTC)

Isn't the purpose of transliteration to give a non-Arabic speaker an idea of how the word is pronounced in standard Arabic? --Yodakii 07:50, 4 September 2005 (UTC)

17th letter of the Arabic alphabet (ظ)

I strongly object to the romanisation of "ظ" as "z". I believe it is highly inaccurate. I believe the same about the lack of differentiation between the 3rd and the 16th letters, the 12th and the 14th. etc. If we're going to do dumb things like that, we should also romanise "ث" as "s", "ذ" as "z", and "غ" as "g" while we're at it... --Node 03:51, 9 August 2005 (UTC)

How do you propose we Romanisize ظ? If you look at Arabic transliteration, just about every system uses z for the character. – Quadell (talk) (sleuth) 15:47, August 9, 2005 (UTC)
If you accept the romanisation of ذ as dh, then some variant of that is logical for ظ. It's the emphatic (velarised) form of ذ, not of ز.
Can I again point out that I'm unable to enjoy this debate in its full glory, as here in the Arab world the average computer won't show most of the characters in set one above (ISO 233)? This at least is to me a strong argument in favour of the second. Whatever system we choose must be robust enough to display on most users' computers. Palmiro 17:19, 9 August 2005 (UTC)
Then maybe we should be using Qalam, which is the only representation guaranteed to work on all systems. The only problem with it is that it can never use Western capitalization.
[Actually SATTS would also work, but it is very ugly and is really meant for a very strict transliteration which I don't think anyone here is proposing.]
I'm not sure where the line is drawn these days for Wikipedia though. Once SAMPA was preferred over IPA, but now it seems to be assumed that people interested in phonetic/phonemic representations will set up their computers to read IPA.
To get some empirical evidence, can you tell us whether you can read any of these letters:
A/I/U with macron (plain) āīū (numeric entities) āīū
Yes
A/I/U with circumflex (plain) âîû (numeric) âîû
Yes
ALA-LC diacritics (plain) ḥṣḍṭẓʻ (numeric) ḥṣḍṭẓʻ
No
UNGEGN diacritics (plain) ḩşḑţz̧ʻ (numeric) ḩşḑţz̧ʻ
Only the second, fourth and fifth.
ISO 233/DIN-31635 diacritics (plain) ˈˌṯǧḥḫẖḏšṣḍṭẓʿġḡẗʾỳ (numeric) ˈˌṯǧḥḫẖḏšṣḍṭẓʿġḡẗʾỳ
This comes out as šṣḍṭẓʿġḡẗʾỳ.
Could you also tell us what browser & version you are using?
IE 6.0.26 on Windows.Palmiro 13:57, 13 August 2005 (UTC)
Moilleadóir 03:15, 10 August 2005 (UTC)
I also don't think "ظ" should be transliterated to "z". In standard Arabic, "ظ" is pronounced as "th" the same way it is pronounced in "this" and "either". It is only pronounced "z" in a few dialects. I think "th" is more appropriate. --Yodakii 08:10, 4 September 2005 (UTC)

Moving ahead

I'm glad we have such a great range of expertise working on this project, and I think we have some good ideas, but I'm worried we'll get bogged down in the details and not move ahead. The condition of the Arabic articles on Wikipedia is so poor and inconsistent at the moment that I think any standard would be a blessing, even if it isn't perfect. I'd like to get a finalized v1.0 of the standard so we can start applying it to articles. But there are a few details to work out first, and I'd like to get some opinions.

  1. For long vowels, such as a long u, should we use u, ū, û, or uu? (The ISO 233-2 standard doesn't seem to cover vowels, unfortunately, since they're usually not written in Arabic.)
  2. For ta' marbūta, should we use ẗ (the ISO 233-2 standard), or should we use h or t where appropriate instead (as it might be less confusing, and it may show up better in some browsers)?
  3. Should 'alif lām assimilate with a following sun letter, such as ash-shams or ad-Din, or should it always be written as "al-"?
  4. Should Arabic names be always alphebetized by family name, always by given name, or should it be a hodgepodge that depends on custom?

Thanks for your input, – Quadell (talk) (sleuth) 14:35, August 12, 2005 (UTC)

    1. An accented character is better in my view, and certainly long vowels should be distinguished from short ones.
    2. Should ة not simply be omitted unless as part of a construct? If included, it should be included as "h"; this is the least surprising for the English speaker. There's an argument for transliterating it as "t" where it follows alif, in line with common pronunciation (e.g. حياة as hayāt) - I'm not particularly committed to either view on that.
    3. I prefer non-assimilation.
    4. Tricky. I would argue by family name in modern cases where there is one, otherwise by the first component in the commonly used name, e.g. ibn Khaldun under i, Mu'awiya under m.Palmiro 10:51, 16 August 2005 (UTC)

There still seems to be a basic issue not worked out, unless I'm missing something. Is there going to be only one standard, or two (ambiguous/simplified, non-ambiguous)? With only one standard, I would strongly object if this were an ambiguous standard. With two standards, the ambiguous one can be used in titles and other normal use, and the non-ambiguous form appears in parens afterwards. (I do not buy at all the assertion that people who want to know the non-ambiguous form should read the Arabic. Plenty of people might be interested in the actual pronunciation, etc. without being able to read Arabic.)

ISO 233-2 above is an ambiguous standard; if we agree on this, then IMO we must also give a non-ambiguous version at the beginning of each title.

IMO:

  1. Even the ambiguous standard should distinguish long vowels. I have no problems with doubled vowels, and they are by far the most foolproof (always available, hard not to interpret as long). Otherwise, probably macrons.
  2. For ta-marbuuTa, let's just omit it except when it's pronounced as /t/ (e.g. in an iDaafa).
  3. I prefer assimilation, since it indicates the pronunciation.
  4. I think I agree with Palmiro about alphabetizing.

However, there's no necessary reason to adopt an ambiguous standard. Ss long as we stay within Unicode Latin-1 and Latin-Extended-A we should be OK, as the vast majority of fonts will have these chars. We could indicate emphatic chars like this: ďťšž or ďťśź; or use acute accents above, specified as combining chars d́t́śź; or dots below, specified as combining chars ḍṭṣẓ; or a symbol afterwards, e.g. d°t°s°z°.

Consider:

aš-šubť w-až-žuhr

or

aś-śubť w-aź-źuhr

or

aś-śubt́ w-aź-źuhr

or

as°- s°ubt° w-az°-z°uhr

or

aṣ-ṣubṭ w-aẓ-ẓuhr

There is also the obvious ħ in Latin-Extended-A.

Benwing 18:56, 16 August 2005 (UTC)


I agree with those who have said that we need a standard that clearly differentiates the actual Arabic letters we're representing. Maybe not everyone wants to know exactly how to pronounce every word, but we should encourage respect for the actual words and allow those interested in the correct pronunciation to have it. I don't think it's too hard for someone to disregard diacritics and read ahead anyway. Even the Arabic script, unless it's fully vocalized, does not provide an unambiguous pronunciation, so a clear transliteration is relevant for both Arabic speakers and non-Arabic speakers.

I advocate writing words approximately as their pronunciation would be indicated in fully vocalized texts, minus case endings, thus facilitating both pronunciation for the non-expert and fidelity to the Arabic speaker. This would include:

  • Assimilating sun letters (fully vocalized texts indicate the assimilation, leaving the lām without a diacritic (thus mute) and giving the sun letter a shadda).
  • Differentiating all "emphatic" and "nonemphatic" letters, which are not two versions of the same letter but entirely different letters.
  • Differentiating hamza and `ayn (their pronunciation is very different). `Ayn could be written as ` or as ʿ, but not as ' if hamza is written this way.
  • Differentiating short and long vowels and sticking with only three (preferably a, i, u, ā, ī, ū, and not o, e, etc.).
  • Possibly representing hamzat waṣl, although this may be sufficiently indicated by an initial vowel without hamza or two initial consonants, as both presuppose the existence of a hamzat waṣl.
  • Tā' Marbūṭa is still the trickiest to keep faithful yet user-friendly: in an 'iḍāfa it should always be written but otherwise -a or -ah might suffice.

I personally like macrons for long vowels and dots below for emphatic letters (although my Linux machine for some reason has a hard time putting the dots under the right letter) simply because this is the standard in most academic publications I've come across and will therefore be recognized.

Finally, I don't have a problem with titles without diacritic marks for readers' convenience--titles are for looking up, the article itself is for looking at.

Jbenhill 08:41, 1 September 2005 (UTC)