Transliteration standards

edit

It has become an increasingly important issue to determine:

  • where to provide transliteration for words
  • which words to tranliterate
  • how to make it easier for a user to understand and use these transliterations wisely
  • how often to transliterate

---

The only consideration, if one is truly doing transliteration is that the original Bengali writing must be recoverable. Mukerjee 19:37, 23 May 2006 (UTC)[reply]

That may not work because Bangla does not have unique pronunciation at least in practice. So, vowels of theoretically different lengths will inevitably end up getting mapped to the same transliteration symbol. This has been discussed in the past and I, myself, have been swayed from against many-to-one mapping to for many-to-one mapping as and when necessary under the influence of our linguist Wikipedians. -- urnonav 00:45, 26 May 2006 (UTC)[reply]

Transliteration scheme for the Bengali language

edit

I have found a few sites with transliteration schemes for Bengali/Bangla. Here they are for open discussion.

The overall attempt is OK. অ is missing although it is being transliterated as "au". I am uncertain as to whether capitalisation is better that doubling the small letter, such as using "I" instead of "ii" to refer to ঈ. I also find transliteration of ঙ,ং and ঞ difficult to use and interpret.
In general, I like this scheme. It looks sensible to me. I disagree with transliteration of ঙ sound. This is a case of standard Bengali versus Bengali in the Eastern zones of Bangladesh (consisting of divisions of Barisal, Chittagong, Dhaka and Sylhet and henceforth referred to as East Bengal). In "East Bengal", the natives cannot usually pronounce ঙ and instead they replace it will the sound ঙ্গ (ng-G pronounced separately as two sounds). Both sounds need transliteration because in "standard Bengali" both appear. Examples: বাঙালী (bāngāli) is "standard", বাঙ্গালী (bāngGāli) is used in parts of Bangladesh including Dhaka.

So, can we start on taking best of all worlds and making our own scheme?-- Urnonav 09:02, 7 Mar 2005 (UTC)

One question, does the Bangla Academy have anything on this? I am not sure, but there might be some sort of a standard. In that case, we can follow that without any problem. --Ragib 07:10, 9 Mar 2005 (UTC)
I have no idea! In any case, the first transileration is archaic possibly made even before my birth; so, I'd expect quite a few transliteration scheme would be "inspired" from that. As far as I know, Bangla Academy only works with standardisation of Bangla; I would not expect them to have created schemes for transliteration and I am having trouble finding their publications! Here's a partial scheme from Samsad English-Bengali Dictionary by Sailendra Biswas (Oh boy! So much for good transliteration!!!) Judging by this, there may be a scheme in one of Bangla Academy's dictionaries. Please note that I continuously will stress on use of "obvious" transliterations. I would not expect a non-Bengali speaker to guess that a means অ and ā means আ. It should, possibly, be something that you can just read "like English" and get to pronounce the way the word should be in Bangla. -- Urnonav 08:42, 9 Mar 2005 (UTC)
In the first one, I don't believe অ is being transliterated as "au"; I think it is ঔ that is transliterated as "au". I in general prefer to stay away from capital letters when possible just because I don't think it looks as nice. I personally would prefer "ii" or actually "ī" instead. I am undecided on the issue of transliterating words or writing easily pronouneable ones—but it we do the latter, then it may be difficult to standardize and it's not really transliterating, is it? In any case, it doesn't matter so much then things like using "n" for both ণ and ন as both would be pronounced the same by an English speaker, right? — Knowledge Seeker 04:59, 13 Mar 2005 (UTC)
I think ঔ is transliterated "ou". For example, Aurum will possibly be written as অরাম and not ঔরাম in Bangla. As for the capital versus double letters, I am indifferent, although I would prefer to not use ī just because you cannot straight type it in. When I said easier to pronounce, I was referring to intuition, not necessarily ease of pronunciation. I am not sure I am explaining myself well. The reader should not have to look up a transliteration table every time s/he has to read a transliterated word. It should be fairly simple to guess a very good approximation. ন and ণ in my opinion are pronounced very similarly, if not identically, even by Bengali speakers; so I am not sure if we should differentiate their transliteration although, like Knowledge Seeker hinted, it might be a good idea to distinguish for the sake of perfection. অ,ব (as in স্ব) and স,ষ,শ have been haunting me forever. Also, is it better to go for a pronunciation-based transliteration or a spelling-based one. I would opt for pronunciation-based because I think fundamentally that's what transliteration is all about! -- Urnonav 07:30, 13 Mar 2005 (UTC)
I agree that it should be. I know you mentioned the difficulty of typing ī, but I feel that it is not that big of a concern since one can just click on the special character below the edit box, right? That's what I've been doing. If we allow ourselves to use those then that opens up some more possibilities. I would suggest using ô for অ to differentiate it from ও which could be "o". ট could be ţ although the equivalent character for ড is not present in the set below. — Knowledge Seeker 07:34, 17 Mar 2005 (UTC)


Proposed transliteration scheme

edit

Final transliteration scheme is on page Bengali script

This is a work in progress; feel free to edit this table. — Knowledge Seeker 07:38, 17 Mar 2005 (UTC)

অ ô আ a or ā ই i ঈ ī উ u ঊ ū
এ e or æ ঐ oi ও o ঔ ou
ং ņ ঃ : (chǒndrobindu) ~ (ã,õ,ű)
ক k খ kh গ g ঘ gh ঙ ņ
চ ch ছ chh জ j ঝ jh ঞ ñ
ট ţ ঠ ţh ড (ড়) đ (ŗ) ঢ (ঢ়) đh (ŗh) ণ ň
ত t থ th দ d ধ dh ন n
প p ফ ph ব b ভ bh ম m
য ĵ য় y র r ল l
শ ş OR sh ষ ş OR sh স s, sh OR ş হ h

Please offer any comments on changes to a letter, ideas for ones I couldn't come up with, or if you think this whole scheme is misguided. — Knowledge Seeker 08:18, 17 Mar 2005 (UTC)

Discussion on proposed transliteration scheme

edit
First of all: excellent work! I will comment on the following:
  • স transliterated to s: it should vary depending on what pronunciation is being used, e.g.স্মৃতি (sriti) and সময় (śomôy)
  • Should we ignore "linked" (jukto) silent ম and ব such as in স্মৃতি and স্বাধীন. I would say we could either put the mute letter in braket such as (s(m)riti) and (s(b)adhin) or ignore them completely to avoid confusion!
  • A similar case will happen with ঞ and ঁ which are practically silent. I agree with ঁ above but not with ঞ, for which I would replace ñ with (ñ).
  • For ং and ঙ I suggest ng but in the cases where the n and g sounds are to be separate, we could use the umlaut such as বঙ্গ (bôngġô) - that's not an umlaut but interestingly Wikipedia special characters don't include g with two dots!
  • ঃ could possibly be transliteration with a ":" to show a little break in the sound?
  • ণ could be transliterated with ň?
  • ড could be a problem, but until we have a choice we can use the ugly capitalisations or go with đ

That should be it for today. Any comments, corrections, improvements, agreements? -- Urnonav 19:19, 17 Mar 2005 (UTC)

  • I agree about স
  • Either way; I am torn about the jukto characters
  • I don't think that a g with a dieresis (umlaut) is normally used—I don't know of any context offhand which uses it, which is probably why it's not one of the Wikipedia characters. The single dot might work
  • : an ň are fine; I personally would prefer đ as I really do think the capitals look ugly (and really don't offer a clue that they should be pronounced differently)
  • We should probably have this information in an article somewhere—somewhere we can say "for information on pronunciation, see ——"

Knowledge Seeker 23:04, 19 Mar 2005 (UTC)

For এ, I proposed e or æ, as the pronunciation varies. What about for অ? For instance, in গরম, the first vowel is pronounced approximately like "paw" whereas the second is more like ও. How to transliterate? Something like gôrōm? Everything I think of is rather arbitrary. — Knowledge Seeker 07:08, 20 Mar 2005 (UTC)


  • গরম would be gôrom according to what seems "intuitive" to me, especially since we are using o for ও already
  • if ড is đ, ঢ could be đh?
  • using ড=đ and ঢ=đh fails to give us anything directly for ড় and ঢ়; so we could resort to rh for both or ř for ড় and ŗ for ঢ়
  • e and æ for এ sound fine to me
  • ষ,শ and স could be translated as sh in the case where they are pronounced same I think because if they sound same, their transliteration should be same (it's also easier to type and I am a ease-of-typing freak but that's a different issue!)
  • we could think of replacing ĵ with z for য although the pronunciation of য is not exactly like z in English

One point that I would like to bring up at this stage is the case of inconvenient typing. So say we are writing an article that has a Bangla title such as পদ্মা but now if we have to repeatedly transliterate it as pôdda throughout the article, it would be an obviously painful process. So, we could use a simplified transliteration that can be based solely on typing characters on a US keyboard layout. So, in the first line we could mention pôdda but then continue with podda throughout the rest of the article. Pôdda was a bad example. Others involving lot of "special" characters would be very painful to type again and again. (My webbrowser doesn't support this "click and appear" for example!) -- Urnonav 10:43, 20 Mar 2005 (UTC)

  • I think ŗ for ড় and ŗh for ঢ় would be logical
  • I really lean to ş over sh for those three—I think mainly out of tradition, and I'm a creature of habit. But since Bengali names and such are usually spelled with an s and not sh (at least in my experience, like "Somit") I think this would be less confusing to those who are vaguely familiar with Indian or Bengali names/words and want to learn more
  • I would definitely prefer ĵ rather than z. Is it pronounced closer to a z in Bangladesh? In Kolkata at least, it sounds similar to, if not identical, to জ
  • Other bullet points I agree with you
  • I (naively) didn't realize that all browsers wouldn't have the palette of characters. I suppose it's JavaScript-dependent. Copying and pasting throughout an article isn't too difficult; even if one just uses a simpler transliteration later on, I'm sure someone else (like me) could come along and copy-and-paste the more complex transliterations. — Knowledge Seeker 03:50, 21 Mar 2005 (UTC)
By the way, I have begun work on Bengali grammar. I've been adding material in bits and pieces, and so far I haven't linked any articles to it. I'll do that soon. Once our transliteration scheme is formalized, I'll link to it from there. — Knowledge Seeker 06:42, 21 Mar 2005 (UTC)
য and জ are pronounced identically in Bangladesh - at least in the part of Bangladesh I am from. People in the East (the geographic East Bengal), however could possibly pronounce য like z, but I am the wrong person to ask about that, although from what I recall, they do, but in any case, it's not "standard pronunciation" and we can skip that. So, ĵ is good to go!
The name thing is a good point you bring up. There are lot of people whose name would not go by this scheme we are creating. For example, রবীন্দ্রনাথ ঠাকুর would be "robindronath ţhakur" with our convention, but I think he himself spelled it Rabindranath Tagore; so little we can do about that although we must include the transliteration scheme for his name in Bangla. Same with কাজী নজরুল ইসলাম which should be kajī nojrul islām according to our convention. So, our convention should state that if a spelling is already in use for names of people, places, cities, we should go with that, even if it contradicts the scheme!
I will continue to stress over sh for general words because it's a phoneme already in English use. We might need a third person's opinion on this.
As for the JavaScript issue, I use Opera which is ideal for standards-compliant, but the problem is when I click on the special characters, it just puts them in this box above the edit box and I have to copy paste. It's a painful process, although Firefox seems to deal with it better. Congratulations on working with Bengali grammar; you are making me ashamed of myself as a Bengali!!! -- Urnonav 19:57, 24 Mar 2005 (UTC)
Very well, I am not entirely convinced, but we can use sh. I'll fix the transliterations when I work on the grammar page. Yes, I use Firefox which also is standards-compliant as far as I know; I don't know how the JavaScript is written but in Firefox it works well. Well then, I'll add the transliterations and you can add the Bengali, which is difficult for me to type (I've just been copying-and-pasting the Unicode references). And don't be ashamed—it is far easier for a non-native speaker (like me) to explain the mechanics of a language than for a native speaker to do so. — Knowledge Seeker 23:29, 26 Mar 2005 (UTC)


I made some final edits to the table to include matters we had agreed upon:

  • ş for ষ and শ and also স but depending on pronunciation; and
  • use of ŗ and ŗh for ড় and ঢ় respectively.

That leaves us with a more or less complete transliteration scheme. We need to put this up somewhere and possibly leave a link on pages where we use the scheme. Could we use a template for this? I am not sure about how Wikipedia handles such things but I will try to check it out.

-- Urnonav 16:26, 26 Apr 2005 (UTC)

Actually, didn't you want to use "sh"? It's fine with me if you do. Yes, I've been thinking about where we could put this. Maybe the best place would be on the Bengali script page, or a separate Bengali tranlisteration page? That could even be developed further into an article discussing the difficulties in Bengali transliteration at some point, should we choose that. If we're just leaving links on Bengali-related pages, I don't know if a template is necessary—how do you envision it being used? Incidentally, I think Bengali script needs an overhaul at some point, just haven't gotten around to it. — Knowledge Seeker 16:37, 26 Apr 2005 (UTC)


Yes, I was insisting on use of "sh" because it just seemed to be that it would be intuitive to readers who speak English, but then I said "ah, what the heck!" In any case, it's one of those issues where we need opinions of others. From what I recall, both of us used "sh" in a few places actually; so, we might need to change either the table or the transliterations we did elsewhere! As for putting the table on the script page, it's fine with me. I would still like to check how Wikipedia likes it done.
I believe we should extend the table or write up an extension to include "juktak-khor". One is discussed above: বঙ্গ, but there are also few others, such as ক্ষ and হ্ম, that involve distinct pronunciation otherwise missing in the letters individually.

-- Urnonav 16:43, 26 Apr 2005 (UTC)