Welcome to MultiLing's Newsletter section, use the navigation to the right to view current and past issues.


2006
   February

2005

January  February  March
June August November
 
2004
January   February   April
  July   August 
November

December

2003

April
   May   June

2002

July
   September   October

2001

January
   March   August

2000

January
   February   March
  April   May   June
   July   August   October

 

 

 

 

 

MULTILING CORPORATION NEWS (FEBRUARY 2005)
THE TRANSLATION TIMES


Localizing for the Indian Subcontinent

Fortis Tip: Translate Faster with Keystroke Shortcuts

DTP Tip: Using Microsoft Word to Convert "Garbage" Text

LOCALIZING FOR THE INDIAN SUBCONTINENT


The Indian subcontinent, consisting of India, Pakistan, Sri Lanka, Bangladesh, Bhutan the Maldives, and Nepal contains approximately one quarter of the world’s population and is a growing market for goods and services, as well as one of the premier outsourcing locations in the world. Although English language competency is quite good among educated 'elites', the majority of the population speaks languages other than English, and cannot be reached through English or any other European languages. While the majority of people in the subcontinent speak about fifteen principle languages, in India alone, schools teach in over fifty languages, and newspapers are published in approximately ninety languages. Tapping the huge potential market of the subcontinent requires careful planning and technical skill.

Languages and writing systems

The languages of the Indian subcontinent are divided into two families: Indo-European languages such as Hindi, Urdu and Bengali, and Dravidian languages such as Kannada and Tamil. The Indo-European languages are distantly related to European languages such as English, German and French, although the languages of European and India are thought to have separated thousands of years ago. The Dravidian languages are historically unrelated to the Indo-European languages and are very different grammatically. All of the languages, however, share certain features due to contact with each other and a common history of literature and philosophy dating back thousands of years.

The primary writing systems of the subcontinent are derived from the ancient Brahmi script, the Devanagari script (used for modern Hindi and the ancient Sanskrit language), and the Arabic script, although there are many individual variations. Unfortunately, not all of the scripts of the subcontinent are currently supported by major operating systems and applications. Although Unicode fully supports the languages, text rendering systems based on Unicode often lags behind, meaning that Unicode text representation is not enough to work with these languages. In addition, the Indian Script Code for Information Interchange (ISCII), which differs dramatically from Unicode in its design and implementation, is still widely used for languages of India and conversion between Unicode and ISCII may be difficult. Although both Apple and Microsoft have recently improved support for languages of the subcontinent in their operating systems, not all applications are able to take advantage of this support.

If you intend to work extensively with languages of the Indian subcontinent, expect to invest in higher-priced versions of common tools such as Quark XPress that are specifically engineered to work with the languages you need. You will also need to work with your localization provider to determine the proper encoding (Unicode, ISCII, or other proprietary encodings) for your projects and to ensure that these encodings are used throughout all stages of the project. If you are developing web content, you may need to provide custom fonts (or license fonts) for website visitors to view your site, and provide clear instructions for downloading and installing these fonts using graphic representations of text that will not require users to have the fonts already installed in order to read instructions on how to install them.

In general, expect to see the best support in computer systems for Devanagari-script languages like Hindi, with declining support for other scripts. If you are developing content for Pakistani Urdu speakers, be aware that the Arabic nastaliq writing system used for Urdu and other Arabic-script languages in the region differs substantially from the nashk variation used for standard Arabic: it contains additional characters and words are written on a slant above the base line rather than along the base line. Because of the unique typographic requirements of nastaliq, Pakistan still has a thriving commercial calligraphy industry that prepares newspapers and magazines for publication; computer support for nastaliq tends to be problematic and text is often prepared in the nashk style on computers and then hand lettered for publication.

Localization and internationalization issues

Internationalization for Indian languages can be tricky since the typographic requirements for most of the languages in the region are far more complex than Roman or East-Asian text. Even though the scripts of the region have fewer characters than east-Asian languages, display of the characters is often highly contextual (one character can have many forms, depending on its position), and different operating systems and applications handle text display in contradictory ways. Text encoded in ISCII will not display properly for Unicode systems, and vice versa, and integration of legacy data can often pose substantial difficulties.

An additional problem for some regional languages is that only the very latest versions of Windows XP and Macintosh OS X 10.3 support proper character rendering. Also, many users will have older versions of these operating systems that will not display text properly, with problems ranging from improper letter spacing to complete garbling of text. The less common a particular language is, the more likely it is that you will run into problems with display. These technical limitations have served as a steep barrier to localization in the region, but recent improvements in Windows Uniscribe rendering capabilities and Apple’s support for Indian languages in its own AAT fonts is lowering this barrier.

Where possible, applications should use Unicode for internal data representation, but should make provision for integration with other tools that represent data using the ISCII standard.

Cultural issues

The culture of the Indian subcontinent has had a continuous recorded history going further back than recorded European culture. This common cultural heritage unites the various ethnic groups of the region to some extent, but local variation is considerable. Some technical products may require relatively little adaptation, but advertising designed with a European or East Asian audience in mind will likely require substantial modification for the region. Be especially cautious in depiction of the human body and animals since the Hindu majority in the region has strong norms regarding how animals and humans may be depicted. In addition depictions of food and people eating will need to be carefully considered since the eating of meat is taboo for Hindus and members of many other religious groups in the region.

If you are localizing culturally-sensitive or advertising materials, it is best to engage with Indian public relations firms that have experience in international business, and to allow extra time for localization problems to be investigated and resolved.

Legal issues

The recent arrest of an Indian eBay executive (because India’s eBay affiliate inadvertently carried a link to content deemed pornographic) shows how seriously laws regarding content are taken in the region. India’s legal system, although based on that of Great Britain, is very different from European models, and local legal assistance should be sought prior to entering into formal business in India. In Pakistan and Bangladesh, Islamic law has had major impact, and legal requirements of Islamic law should be investigated prior to market entry.

Conclusion

The Indian subcontinent represents a challenging yet exciting opportunity for companies expanding their international business. In some ways, it represents one of the most difficult regions of the world for localization, but with its growing wealth and stature on the world stage, the region cannot be ignored. Successfully engaging with the region can give western companies an early lead in the growing market and enable them to capitalize on their investment in the long term.

TOP

FORTIS TIP: TRANSLATE FASTER WITH KEYSTROKE SHORTCUTS


Use these convenient keystroke shortcuts to help you translate faster in Fortis :


"Ctrl" + "O" Open Language Pair
"Ctrl" + "P" Print File
"Alt" + "F4" Save File and Exit
"Ctrl" + "F" Find
"Shift" + "F3" Toggle Letter Case
"Ctrl" + "S" Save File
"Ctrl" + "Z" Undo
"Ctrl" + "G" Go to
"Alt" + "B", "E" Replace Translation
"Alt" + "Ins" Mark Segment as Translated, go to Next Translatable Segment, Load Fuzzy Matching
"Ctrl" + "Alt" + "+" Go to Next Translatable Segment
"Ctrl" + "Alt" + "-" Go to Previously Translatable Segment
"Alt" + "Enter" Load Fuzzy Matching
"Alt" + "5" View Fuzzy Network Window
"Alt" + "N" Continue Alignment
"Alt" + "P" Toggle Tag Protection
"Alt" + "T" Replace with First Term from Dictionary
"Alt" + "K", "A"-"Z" Replace with Selected (A-Z) Term from Dictionary
"Alt" + "G", "A"-"Z" Insert with Selected (A-Z) Term from Dictionary

TOP

 

DTP TIP: USING MICROSOFT WORD TO CONVERT "GARBAGE" TEXT


A common problem in localization is that text may be received which has been encoded with an encoding other than the one you want the text in. When this happens, text will appear to be "garbled", with problems ranging from a few improper characters, to text that appears as pure gibberish. While the use of Unicode is reducing this problem, you may be able to use Microsoft Word 2003 or later (Windows version only) to rescue "garbage" text. To successfully do this, you must know which encoding was used for the original text (or be prepared to try multiple encodings until you find one that works), and carry out the following steps:

1. If the file is not already in plain text format, open the file in Microsoft Word and save it as plain text (.txt)
2. Set Microsoft Word to display text conversion options when opening text files by clicking ‘Options’ in the Tools menu, selecting the ‘General’ tab and checking the ‘Confirm conversion at Open’ box. Click OK. (Note: this needs to be done only once per machine).
3. Select File > Open in Word, and then select ‘Text Files’ in the Files of Type drop-down menu and click Open.
4. The Convert file from window will open. Select ‘Encoded Text’ and then click OK.
5. Click ‘Other encoding’ and then select the encoding used for the original file.
6. The file will now open with the proper characters. If it does not, then repeat steps 3 through 5 trying other possible encoders for the source text.

While this process will not rescue all "garbage text", it will solve many text encoding issues, and can save retying or development of custom encoding converters for text in legacy encoding.

TOP

 

 

Home |Company |Translation |Localization |Technology |Clients |Press |Contact |Bid Request |Site Map
|Terms of Use|

Copyright © 1994-2007 MultiLing Corporation. All Right Reserved.
MultiLing®, Fortis®, and Semantis® are registered trademarks of MultiLing Corporation.
All other trademarks and service marks are the property of their respective owners.