Thoughts on ICANN's IDN TLD Evaluation Deployment in the Root Zone
I just wanted to take a moment to look at the effort underway by ICANN regarding the program that enables the routine introduction of TLDs (Top Level Domain) with IDN (Internationalized Domain Name) labels.
I find this effort to be both ground-breaking as well as monumental in its overall concept. What this means is this program will pave the way for domain names to be internationalized, containing non-ASCII character sets. To me this speaks of the fact that domain names will contain character sets that may not be recognizable to other users on the net. From a pure social networking perspective, this can yield to a much more localized experience for Internet users; an experience that can possibly help foster cultural heritage.
True, the effort underway today is to test how DNS (Domain Name System) will accommodate such a change. The test will be to use the TLD of “.test” and localize “.test” in eleven different languages to see what effects this may have on the whole DNS structure. Through the use of “scripts”, words will be translated to their respective languages. One of the purposes of this test is to develop the process for quickly removing such IDN-based TLDs should the DNS structure become unstable. The scope of the immediate task at hand is well defined, manageable and will utilize a non-production DNS structure.
However, when examining the overall goal, think how monumental this task can be! Think of languages in general. How many languages will this effort eventually be able to support? Doing a quick scan of languages, I found one page that lists the “official” languages of India. As you can see, the list is as follows:
- Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kasmiri, Konkani, Maithili, Malayalam, Meitei, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Santhali, Sindhi, Tamil, Telugu and Urdu.
However, another page lists the languages in India that are spoken by more than 1 million people of India. Those languages include:
- Assamese, Awadhi, Bagri, Bengali, Bhili, Bhojpuri, Chhattisgarhi, Deccan, Dogri-Kangri, Garhwali, Gujarati, Haryanvi, Hindi, Ho, Kanauji, Kannada, Kashmiri, Khandesi, Konkani, Konkani(Gaonese), Kumaoni, Kurux, Lamani, Magahi, Maithili, Malayalam, Malvi, Marathi, Marwari, Meithei, Mundari, Nepali, Nimadi, Oriya, Punjabi, Sadri, Santhali, Sindhi, Tamil, Telugu, Tulu, Urdu.
Hmmmmm…… So how many other languages of India are there that are spoken by less than 1 million people? Who will decide which language, or languages, to include for a specific country?
What about language nuances? While vacationing in Austria we ran across a German-Austrian dictionary. While to some that may appear perfectly fine, the astounding thing is both countries speak German! Although small it was amazing to see that there is enough of a difference between how either country uses German that there should be a German-Austrian dictionary.
Getting back to IDNs, it will be interesting to see what challenges arise when dealing with languages whose character sets contain non-ASCII characters.
One avenue to help maintain a stable DNS structure is to handle IDN-based TLDs through browsers utilizing scripts to translate the information into the structure we use today, instead of having DNS handle the translation. But getting back to the shear complexity of this simple statement, think of languages that are written from right to left. Think of languages that have difficulty with any character set translation. One item that came to mind were languages that use clicking sounds within their vocabularies, such as the ever popular . Imagine talking with someone over the phone and telling them to write down this URL! As stated in the draft proposal titled, "IDN Application Evaluation Facilities", the “IDNA currently requires that a string of characters in a script written right-to-left neither begins nor ends with a “combining mark". (A string of left-to-right characters may not begin with a combining mark either, but it may end with one.) The clearest example of resulting difficulty that has thus far been noted is with Dhivehi, the official language of Maldives. This is written in the Thaana script (in the Unicode range U+0780...U+07BF), which requires the addition of a combining mark to every base character. A vowel following a consonant is indicated with a combining mark, and special combinations are used to indicate consonants and double vowels in syllable final position.”
Also, what about the length of the word? Right now the longest TLD is six characters in length, that being “.museum” and “.travel”. While this has been extended to support larger words, we may find that language localization, especially if official country names are used, “… stored strings of up to the maximum of 63 characters require evaluation”. Can anyone recite the longest word in the English language? Does anyone KNOW the longest word in the English language? Well, here it is – all 1185 characters! Based on the context of this discussion I found it humorously ironic that the first message on that page is, “The correct title of this article is too long. Article title lengths must be less than 256 characters because of technical restrictions.” Hmmmmm... A shadow of things to come, perhaps?
The other monumental challenge I see is proper translation. First of all, type out a single paragraph, find a site that will do a free, on-the-spot translation for you. Next take the translated text and translate it back. Do you find the exact same paragraph/context that you originally typed? The other aspect is how organizations deliberately misspell words to appear “edgier”. Might this signal a move away from the deliberately misspelled words back to proper-spelled words? If so, imagine what impact this might have within the domainer’s world!
Yes, while this concept will start with the use of the mere word, “.test”, I feel the challenges and implications are nothing short of ground-breaking.
Check back as I will have another posting on the practical implications from the user’s perspective; a perspective from non-English speaking countries.
What are your thoughts on this topic? Please chime in as this is an open community. Discussion is healthy and we want to hear your opinions.
|