Nepal is a fascinating linguistic mosaic! Once communities settled in Nepal, they became completely isolated due to steep valleys, towering mountains, and thick forests. This isolation led to something remarkable – the evolution of many distinct languages.
Figure 1: The Linguistic Map of Nepal.
Here's something interesting: the 2001 census reported 92 languages, but the 2021 census identified 124 distinct mother tongues [1]. That's a significant jump! The increase is largely down to the reclassification of dialects within larger groups that were previously thought to belong to a single linguistic community.
Note: If we follow Trosterud's suggestion that languages with more than 16,000 speakers should be written, we'd expect all languages down to and including Dhimal to be written. That amounts to 28 languages – just under one-third of the total, which aligns with the proportion of written languages globally.
Nepali Writing Systems
Four Nepalese languages have a particularly significant tradition of written use. Let's explore each one:
1. Nepali
Nepali (historically known as Khas, Parbatiya, and Gorkhali) had 21 million speakers in 2021 [1]. It's been written in the Devanagari script – the same beautiful script used across northern India, particularly for Hindi – for approximately 300 years.
2. Newari
Spoken by 880,000 people in 2021 [1] and known as Nepal Bhasa within its linguistic community, Newari has been written for over a thousand years using various scripts. That's quite a literary heritage!
3. Limbu
With 410,000 speakers in 2021 [1], Limbu uses a traditional script called Sirijanga, which likely originated from Lepcha writing. The script is believed to have been created in the 9th century, revived in the 17th century by Te-ongsi Sirijonga, and revived again in 1925 when it was formally named "Sirijanga" [5].
4. Lepcha
Also known as Rong, with 66,730 speakers across Nepal, Sikkim, and India, Lepcha is written in a script derived from the Tibetan script [6]. Tradition suggests this script was developed in the 17th or 18th centuries.
Important context: Ethnologue only reports limited literacy for Newari and Limbu, which isn't surprising since these languages were suppressed by successive Nepalese governments from the late 18th century onwards until 1990 [7]. Whilst the writing of Limbu and Lepcha was probably only ever used for special cultural and religious texts, Newar writing was used for a wide range of purposes until the overthrow of their regime by the Gorkhas in the mid 18th century.
Tip: Cross-border languages, particularly Maithili and Bhojpuri, also have their own mature literature and may be written in their own distinctive scripts – for Maithili the script is known as Mithilakshar or Tirhuta, for Bhojpuri it's Kaithi.
The Digital Age: Encoding Scripts
Here's where things get really interesting from a technical perspective!
Indic writing, including Devanagari and Bengali, has been printed in movable type since around 1800, with the type evolving and being simplified over the centuries. When computers came along for writing and publishing, the encoding of Devanagari and other Indic scripts was undertaken in India, leading to the Indian Script Code for Information Interchange – ISCII.
Initially, work had been proposed to include Devanagari within the then-established standard for computers, ISO 8859, as part 12. However, this work was abandoned with the expectation of adopting ISCII's codes into ISO 8859. But here's the twist – ISO 8859 was in turn superseded by Unicode, which included a code block for Devanagari and other major Indic scripts from the start.
Technical note: One significant difference between ISCII and Unicode was that in ISCII all the scripts of India had been unified within a single table (with different scripts selected by appropriate font), whereas in Unicode these were dis-unified into separate code blocks.
Limbu and Lepcha Enter the Digital World
Limbu's Journey to Unicode
The encoding of Limbu was added to the Unicode Standard in April 2003 with the release of version 4.0 [5]. Limbu was introduced to the standardisation process by McGowan and Everson in 1999, and a proposal was written jointly by Boyd Michaelovsky and Michael Everson in 2002.
Michaelovsky is a linguist who's done considerable field research amongst the Limbu in Nepal, learning about their writing in context and appealing in the proposal to both examples of writing and to the phonology of the spoken language.
Note: Even so, there have been some discussions since then about missing characters, and in 2011 Pandey proposed two additional composite characters, though there's a case for introducing the virama instead.
Lepcha's Path to Standardisation
The encoding of the Lepcha script was initiated by Michael Everson and others within the Unicode Technical Committee in 2003, and formally proposed in 2005. It was finally added to the Unicode Standard in April 2008 with the release of version 5.1 [6].
Primary sources of knowledge about Lepcha writing in the Everson document came from two academic texts from the late 19th century and several texts from the 1970s, with copious samples of writing taken from these texts included in the appendices. The proposal also referenced two experts consulted: a linguist in Leiden in the Netherlands, and a typographer with Xenotype in the US.
Important: Whilst the writing of Lepcha and Limbu have followed a normal path to standardisation – an introduction of the script to the standardisation community, followed by a full proposal, and then agreement within the ISO and Unicode committees – Newar writing hasn't had such a smooth passage.
Other Languages and Their Encoding
Field linguists aiming to document the languages that they study, and members of the Summer Institute of Linguistics (SIL), have for many years improvised a means of writing languages, usually based on Devanagari. Michael Noonan has given a very thorough analysis of some of these, relating the choices made to the underlying phonologies of the languages.
Fig 2. The Sikkim Herald in 11 languages (from Mark Turin.)
The Maithili Story
When the Indian constitution first scheduled its official languages, Maithili was viewed as a dialect of Hindi – a view that was vigorously contested! This eventually led to the inclusion of Maithili as a distinct scheduled language in 2004 [8], though it's still written in Devanagari.
Their traditional style of writing, Mithilakshar/Tirhuta, was treated as an exotic for use in wedding invitations and similar occasions, though discussions have been made as to whether it could be unified with Bangla or with Devanagari.
In 2008, a Unicode-compliant Mithilakshar font called Janaki was produced in Nepal, mapped to the Devanagari code block. This implicitly assumed a unification with Devanagari for the advantage that existing documents encoded in Devanagari could be rendered in Mithilakshar by a simple change of font.
Then in 2011, Pandey proposed a separate encoding of Tirhuta, arguing briefly (and inadequately, some would say) that it couldn't be unified with Bengali, but not discussing the situation with respect to Devanagari.
Languages Yet to Be Written
A large proportion of Nepal's languages aren't yet written, though linguists and anthropologists have written fragments of many languages using extensions of Devanagari. Some language activists have created their own distinctive writing systems, with proposals that have reached discussion towards standardisation.
These languages include:
- Sunawar
- Bantawa
- Gurung
- Magar
- Dhimal
Interesting fact: Much of the drive for the writing of several of these languages seems to come from Sikkim, where they're also spoken. The official newspaper The Sikkim Herald is published in 11 languages with distinctive scripts and typography!
The Standardisation Challenge
All of the scripts or writing styles for these languages are seen as candidates for separate standardisation, apart from Magar. Magar speakers claim to write their language in Brahmi (which they call Akkha), and thus Pandey concludes:
"Until additional research provides information that clearly differentiates it from Brahmi, Magar Akkha should be considered a variant of the latter and unified with it."
In the discussion about the Tikamuli writing for Sunawar, Pandey notes:
"It has no genetic relationship to other writing systems, although it has similarities to the Limbu (Sirijonga) and Lepcha (Rong) scripts."
Note: What's meant by a "genetic relationship" isn't entirely clear – there will certainly have been contact between the linguistic groups with the diffusion influences that then take place. What's not being considered for these languages, apart from Magar, is unification with any other Unicode code blocks.