|
The Indian subcontinent, consisting of India, Pakistan,
Sri Lanka, Bangladesh, Bhutan the Maldives, and
Nepal contains approximately one quarter of the
world’s population and is a growing market
for goods and services, as well as one of the premier
outsourcing locations in the world. Although English
language competency is quite good among educated
'elites', the majority of the population speaks
languages other than English, and cannot be reached
through English or any other European languages.
While the majority of people in the subcontinent
speak about fifteen principle languages, in India
alone, schools teach in over fifty languages, and
newspapers are published in approximately ninety
languages. Tapping the huge potential market of
the subcontinent requires careful planning and technical
skill.
Languages and writing systems
The languages of the Indian subcontinent are divided
into two families: Indo-European languages such
as Hindi, Urdu and Bengali, and Dravidian languages
such as Kannada and Tamil. The Indo-European languages
are distantly related to European languages such
as English, German and French, although the languages
of European and India are thought to have separated
thousands of years ago. The Dravidian languages
are historically unrelated to the Indo-European
languages and are very different grammatically.
All of the languages, however, share certain features
due to contact with each other and a common history
of literature and philosophy dating back thousands
of years.
The primary writing systems of the subcontinent
are derived from the ancient Brahmi script, the
Devanagari script (used for modern Hindi and the
ancient Sanskrit language), and the Arabic script,
although there are many individual variations. Unfortunately,
not all of the scripts of the subcontinent are currently
supported by major operating systems and applications.
Although Unicode fully supports the languages, text
rendering systems based on Unicode often lags behind,
meaning that Unicode text representation is not
enough to work with these languages. In addition,
the Indian Script Code for Information Interchange
(ISCII), which differs dramatically from Unicode
in its design and implementation, is still widely
used for languages of India and conversion between
Unicode and ISCII may be difficult. Although both
Apple and Microsoft have recently improved support
for languages of the subcontinent in their operating
systems, not all applications are able to take advantage
of this support.
If you intend to work extensively with languages
of the Indian subcontinent, expect to invest in
higher-priced versions of common tools such as Quark
XPress that are specifically engineered to work
with the languages you need. You will also need
to work with your localization provider to determine
the proper encoding (Unicode, ISCII, or other proprietary
encodings) for your projects and to ensure that
these encodings are used throughout all stages of
the project. If you are developing web content,
you may need to provide custom fonts (or license
fonts) for website visitors to view your site, and
provide clear instructions for downloading and installing
these fonts using graphic representations of text
that will not require users to have the fonts already
installed in order to read instructions on how to
install them.
In general, expect to see the best support in computer
systems for Devanagari-script languages like Hindi,
with declining support for other scripts. If you
are developing content for Pakistani Urdu speakers,
be aware that the Arabic nastaliq writing system
used for Urdu and other Arabic-script languages
in the region differs substantially from the nashk
variation used for standard Arabic: it contains
additional characters and words are written on a
slant above the base line rather than along the
base line. Because of the unique typographic requirements
of nastaliq, Pakistan still has a thriving commercial
calligraphy industry that prepares newspapers and
magazines for publication; computer support for
nastaliq tends to be problematic and text is often
prepared in the nashk style on computers and then
hand lettered for publication.
Localization and internationalization issues
Internationalization for Indian languages can be
tricky since the typographic requirements for most
of the languages in the region are far more complex
than Roman or East-Asian text. Even though the scripts
of the region have fewer characters than east-Asian
languages, display of the characters is often highly
contextual (one character can have many forms, depending
on its position), and different operating systems
and applications handle text display in contradictory
ways. Text encoded in ISCII will not display properly
for Unicode systems, and vice versa, and integration
of legacy data can often pose substantial difficulties.
An additional problem for some regional languages
is that only the very latest versions of Windows
XP and Macintosh OS X 10.3 support proper character
rendering. Also, many users will have older versions
of these operating systems that will not display
text properly, with problems ranging from improper
letter spacing to complete garbling of text. The
less common a particular language is, the more likely
it is that you will run into problems with display.
These technical limitations have served as a steep
barrier to localization in the region, but recent
improvements in Windows Uniscribe rendering capabilities
and Apple’s support for Indian languages in
its own AAT fonts is lowering this barrier.
Where possible, applications should use Unicode
for internal data representation, but should make
provision for integration with other tools that
represent data using the ISCII standard.
Cultural issues
The culture of the Indian subcontinent has had
a continuous recorded history going further back
than recorded European culture. This common cultural
heritage unites the various ethnic groups of the
region to some extent, but local variation is considerable.
Some technical products may require relatively little
adaptation, but advertising designed with a European
or East Asian audience in mind will likely require
substantial modification for the region. Be especially
cautious in depiction of the human body and animals
since the Hindu majority in the region has strong
norms regarding how animals and humans may be depicted.
In addition depictions of food and people eating
will need to be carefully considered since the eating
of meat is taboo for Hindus and members of many
other religious groups in the region.
If you are localizing culturally-sensitive or advertising
materials, it is best to engage with Indian public
relations firms that have experience in international
business, and to allow extra time for localization
problems to be investigated and resolved.
Legal issues
The recent arrest of an Indian eBay executive (because
India’s eBay affiliate inadvertently carried
a link to content deemed pornographic) shows how
seriously laws regarding content are taken in the
region. India’s legal system, although based
on that of Great Britain, is very different from
European models, and local legal assistance should
be sought prior to entering into formal business
in India. In Pakistan and Bangladesh, Islamic law
has had major impact, and legal requirements of
Islamic law should be investigated prior to market
entry.
Conclusion
The Indian subcontinent represents a challenging
yet exciting opportunity for companies expanding
their international business. In some ways, it represents
one of the most difficult regions of the world for
localization, but with its growing wealth and stature
on the world stage, the region cannot be ignored.
Successfully engaging with the region can give western
companies an early lead in the growing market and
enable them to capitalize on their investment in
the long term.
TOP
|
|
A common problem in localization is that
text may be received which has been encoded with
an encoding other than the one you want the text
in. When this happens, text will appear to be "garbled",
with problems ranging from a few improper characters,
to text that appears as pure gibberish. While the
use of Unicode is reducing this problem, you may
be able to use Microsoft Word 2003 or later (Windows
version only) to rescue "garbage" text.
To successfully do this, you must know which encoding
was used for the original text (or be prepared to
try multiple encodings until you find one that works),
and carry out the following steps:
1. If the file is not already in plain text format,
open the file in Microsoft Word and save it as plain
text (.txt)
2. Set Microsoft Word to display text conversion
options when opening text files by clicking ‘Options’
in the Tools menu, selecting the ‘General’
tab and checking the ‘Confirm conversion at
Open’ box. Click OK. (Note: this needs to
be done only once per machine).
3. Select File > Open in Word, and then select
‘Text Files’ in the Files of Type drop-down
menu and click Open.
4. The Convert file from window will open. Select
‘Encoded Text’ and then click OK.
5. Click ‘Other encoding’ and then select
the encoding used for the original file.
6. The file will now open with the proper characters.
If it does not, then repeat steps 3 through 5 trying
other possible encoders for the source text.
While this process will not rescue all "garbage
text", it will solve many text encoding issues,
and can save retying or development of custom encoding
converters for text in legacy encoding.
TOP |