Wikipedias and the Widening e-Gap

New internet-based devices were developed during the first decade of the 21st century, which every language now needs to don in order to be worthy of this distinction, among others, its own Wikipedia and computer keyboard layout. The keyboard with a given language’s standardized set of graphemes (that is, letters) – which constitutes a part of the Universal Character Set – allows for the versatile and flexible written use of a given language in cyberspace and across a variety of computer file and wordprocessing formats. On the other hand, the Wikipedia of a language is also an indicator of the sociocultural ‘fitness’ of this language and its speech community. It says that there is a wealth of information available in this language, so that it makes good sense to access this information and to contribute to the storage of knowledge recorded in the lect of a given speech community. The fitter a language in this respect, the more capital it attracts, too, because if there are enough well-off literate people with much leisure at their hands choosing to communicate through and tinker for free in a language, they also may want to read advertisements and watch commercials in this language. If the activists and their followers are so emotional about their lect recently elevated or in the process of being elevated to the rank of a language, it is possible that they will more eagerly buy products and services advertised with the use of this language than that of the official language in a state.

A variation on the Wkipedia logo




Economic capital converging with its social counterpart on the platform of an emergent language is hard to overlook. The situation may quickly spawn a political dimension that authorities and politicians in a state may wish to suppress, use for their parties’ ends, or may chose to remain neutral to. In modern Central Europe, where languages are commonly put to the laborious task of creating nations, building and legitimizing nation-states, and to keeping ethnolinguistically defined ‘aliens’ away from the national body politic, rarely does a politician have the chance to stay unperturbed by a new ‘language question.’ Hence, developments in the case of small and unofficial languages are quite unpredictable. First of all, they depend on a group of dedicated enthusiasts who manage to attract attention and support of this or that target speech community for transforming a lect into a language. But this is not enough, especially if the linguistic entrepreneurs fail to convince the powers that be not to take a dim view of the project. Hence, some linguistic (or more aptly, language-building) projects fail, while others of which nothing was heard a couple of years ago, unexpectedly shoot out to (at times, short-lived) prominence.

Let us only mention as examples such newly codified (or revived, thanks to ‘cyberspace codification’) non-state or politically neglected languages that enjoy their Wikipedias as the following ones: Alemannic in Austria, France and Switzerland; Aromanian and Ladino in the Balkans and Turkey; Limburgish and West Flemish in Belgium and the Netherlands; Walloon in Belgium and France; Esperanto, Romani and Yiddish in Central Europe; Võro in Estonia; (Northern) Sami in Finland, Norway and Sweden; Franco-Provençal in France, Italy and Switzerland; Bavarian, Low Saxon (Low German), Low Sorbian, Palatinate German, Ripuarian and Upper Sorbian in Germany; Pontic in Greece and around the Black Sea; Emilian-Romagnol, Friulian, Ligurian, Neapolitan, Piedmontese, Sicilian and Venetian in Italy (alongside their cognate Corsican in France’s Corsica); Lombard in Italy and Switzerland; Latgalian in Latvia; Samogitian in Lithuania; Luxembourgish in Luxembourg; Gagauzian and Moldovan in Moldova and Transnistria; Dutch Low Saxon, North Frisian, Saterland Frisian and West Frisian in the Netherlands and Germany; Zeelandic in the Netherlands; Kashubian and Silesian in Poland; Romansh in Switzerland; Crimean Tatar in Ukraine; or Rusyn in Ukraine and Serbia.

Many of the aforementioned languages have no official status or are not even recognized in the states where they happen to be in use. However, there are some liminal cases, too. Luxembourg, which is a founding member-state of the European Union, recognized its local Germanic lect of Luxembourgish as the state’s national language in 1984. However, Luxembourgish was not made into an EU official language, and the administration of Luxembourg and its educational system are overwhelmingly run through the media of German and French. The story is similar in the case of Romansh, which is one of Switzerland’s official languages, but does not enjoy significant official use outside a single canton, and let alone outwith the country. Gagauzian is co-official in Moldova’s autonomous region of Gagauzia, as Crimean Tatar was in Ukraine’s autonomous Crimea till the Russian annexation of the peninsula in 2014. (The status of the Crimean Tatars and their language in today’s Crimea is still unclear.) Also quite a few of the languages enumerated above are now protected – and thus granted a highly varying degree of recognition – under the provisions of European Charter for Regional or Minority Languages, for instance, Kashubian in Poland, Low German (Low Saxon) in Germany or Vlach (Aromanian) in Serbia.

The story of Moldovan and its Wikipedia is quite instructive of the fact that it is people who create and decide what a language is. It is them alone who produce, reshape and destroy the class of artifacts known as ‘languages’ (Einzelsprachen). Moldovan was the official language of the nation-state of Moldova between 1994 and 2013. For all practical reasons being identical with Romanian, it was eventually renamed as ‘Romanian.’ The Moldovan-language Wikipedia was written in Cyrillic, because it had been the language’s script until 1989; the change of the alphabet – from Cyrillic to Latin – had been decreed two years before the dissolution of the Soviet Union, of which Moldova (Moldavia) had constituted a part. The Cyrillic-based Moldovan Wikipedia was closed down in 2006, however many of its contributors disagree with this decision, and though inactive, the resource still remains online. Likewise, the Cyrillic-based Moldovan language continues in official use in the de facto polity of Transnistria that broke away from Moldova in 1990-92, and became a Russian protectorate. What is more, Moldovan written in Cyrillic continues to enjoy legal protection in Ukraine.

Political support for the successful recognition of a language is as much needed in cyberspace as on the ground in the ‘physical’ world. Although Moldova pulled out the carpet of official recognition from under the Moldovan language’s feet, the internet community supporting this language, with the advantage of the overbearing weight of Russia behind this project, contributes to the preservation of (especially Cyrillic-based) Moldovan as a language. The fate of the Montenegrin Wikipedia starkly contrasts with the case of the Moldovan counterpart. In 2007 Montenegrin was officially excluded from the former Serbo-Croatian linguistic commonality and made into the official language of the newly independent nation-state of Montenegro. But in cyberspace the pressure of Croatian, Serbian and Serbo-Croatian opponents of this new post-Serbo-Croatian language was so overwhelming that the request for a Montenegrin Wikipedia was rejected. Between 2006 and 2008, the planned Montenegrin Wikipedia was run independently as the Crnogorska Enciklopedija (Montenegrin Encyclopedia), but eventually it fell a victim to hackers and trolls (mainly from Serbia) who conducted a sustained barrage of cyber-attacks against the encyclopedia’s site.

On the other hand, although Serbo-Croatian is no longer an official language anywhere in the world, yet it still survives in the internet with a biscriptural (Cyrillic and Latin script-based) Wikipedia under its belt. The language’s home country of Yugoslavia and its nation of Yugoslavs have been gone for two decades by now. The monumental dictionary of the Serbo-Croatian language, commenced in 1959 in two scriptural versions (that is, in Latin and Cyrillic letters, respectively) was discontinued in Croatia, but the publication of the Cyrillic variant of this multivolume reference progresses in Serbia, tellingly, under the unchanged title. It is so despite the fact that nowadays Croatian and Serbian, respectively, are official in the two countries, not any Serbo-Croatian.

Belarusian uniquely sports two Wikipedias, one in the pre-Soviet, early 20th–century, national spelling and the other in Soviet (Russifying) orthography. This unique phenomenon is a reflection of a bitter ideological quarrel between those who would like Belarus to become an ethnolinguistic nation-state, as the political standard is in Central Europe, and their opponents satisfied with Belarus as it is today, complete with the Soviet-style symbolism and the Russo-Belarusian bilingualism that clearly prioritizes the Russian language at the expense of Belarusian. A somewhat similar story has unfolded in officially monolingual Norway in the case of the Norwegian language, because it consists of two equal varieties, Bokmål and Nynorsk. Hence, two different Wikipedias had to be created for Norwegian-speakers, one in the Bokmål variety and another in the Nynorsk one. Both references are quite substantial, the former ranking as the 18th largest Wikipedia and the latter as 46th among the world’s almost 300 extant Wikipedias. If the contributors of the two Norwegian Wikipedias would join forces, their combined ‘pan-Norwegian’ Wikipedia would rank as 16th largest in the world. Given the fact that there are about 5 million Norwegian-speakers, they – among ethnic groups enjoying their own nation-state – seem to enjoy the highest number of Wikipedia articles per a unit of population anywhere on the globe, or 54,000 Wikipedia entries per 1 million speakers.

The existence of Wikipedias and other electronic and internet resources in the world’s languages maps quite faithfully the division of the world between the rich and politically powerful North (aka the West) and the poor and marginalized South (aka the Third World, or the developing, undeveloped or underdeveloped world). In the North even relatively small speech communities of several tens or hundreds of thousands of speakers can afford creating and maintaining a vibrant publishing industry, a network of educational and cultural institutions, alongside a pronounced ethnolinguistic presence on the internet. Furthermore, groups of enthusiasts and scholars of some languages with defunct or no speech communities are sizeable and well-off enough to run Wikipedias and publishing industries in such languages. The former type of, so-called ‘dead,’ languages include Anglo-Saxon, Gothic, Latin, Old Church Slavonic, or Syriac; while the latter type, that of constructed languages, is represented by Basic English, Esperanto, Ido, Interlingua, Novial or Volapük.

On the other hand, in the South, languages with tens or hundreds of million speakers support very modest publishing industries and have just tiny, if any’ internet references at their disposal. The Hindi-language Wikipedia with over 100,000 articles is roughly equal in size to its Greek counterpart, though the latter caters to 12 million Greek speakers, while the former to 260 million Hindi-speakers. It means that there are 434 articles in the Hindi Wikipedia per one million Hindi-speakers, or 124 times less than in the case of Norwegian-speakers and their two Wikipedias. This is the actual and measurable size of the gap in economic and political power between languages in the North and the South. Obviously, the aforementioned gap exists between speech communities, not languages per se. The comparison in the cyberspace ‘fitness’ of the languages is just an indicator of the huge disparity in economic and political power between speech communities, or put more simply, between human groups. This disparity explains why speakers of as many as six of India’s official languages do not have Wikipedias in their own languages at all. It is the sad case of Bodo (with 1.4 million speakers), Dogri (2.3 million), Konkani (7.6 million), Maithili (32 million), Odiya (33 million) and Santali (6.5 million). At over 80 million, the speakers of these six Indian official languages are equal to the entire population of Germany, or the EU’s most populous member state.

It is popularly accepted that nowadays there are over 7,100 languages in the world. Europe’s languages constitute a mere 4 per cent of the grand total. But when it comes to publishing and internet resources, a vast majority of the facilities are available only in the European languages. Among the 14 Wikipedias with more than one million entries, just three are in non-European languages, namely in Vietnamese, and in the two Philippines languages of Cebuano and Waray-Waray. Among the 42 Wikipedias with more than 100,000 and less than 1 million articles, 28 are in European and 14 in non-European languages. All the non-European languages featured in this and the former group of Wikipedias are native exclusively to Asia. Tellingly, not a single African language is represented in either of the two groups, though Africa and Asia are each home to roughly a third of the globe’s all languages.

The distinction of the largest Wikipedia in an African language belongs to Malagasy, which is official in Madagascar with its 23 million inhabitants. The Malagasy Wikipedia of 47,000 entries ranks as the 73rd largest, but, quite poignantly, it is still smaller than the Welsh or Albanian Wikipedia. And ironically, in its origin, Malagasy is an Asian language introduced to Madagascar around the seventh century CE. The title of Africa’s second largest Wikipedia, ranking as 81st (with 32,000 entries) among the world’s Wikipedias, belongs to Afrikaans, which similarly to Malagasy, was brought to southern Africa in the 17th century from outside this continent, that is, by Dutch settlers from Europe. Hence, only the Yoruba Wikipedia (87th; 30,000 entries) is the largest African Wikipedia, should we take the indigenously African origin of this language into consideration. In size this Yoruba Wikipedia is almost equal to the 88th Wikipedia, written in the aforementioned tiny European lect of West Frisian. Barely half a million people speak it, while Yoruba-speakers number 20 million The e-gap between the two groups can be conveniently summarized through the following statistical ratio: there are 60,000 Wikipedia articles per one million Frisian-speakers and a mere 1500 per one million of Yoruba-speakers. The latter speech community has 40 times less such articles per the unit of population than the former.

Bearing in mind the rapid economic and political ascendancy of China at the beginning of the 21st century that coincides with the emergence of the internet and Wikipedia, one would think that the Chinese Wikipedia should have been one of the most extensive ones by now. However, with its 779,000 entries it is only the world’s 15th Wikipedia, ranking below the Japanese and Portuguese Wikipedias (with 919,000 and 833,000 entries, respectively) and before the Ukrainian and Catalan Wikipedias (515,000 and 437,000 entries). I infer that the relatively middling size of the Chinese Wikipedia, despite all the unprecedented technological and economic development in the country, may be a function of China’s closely and repressively guarded lack of freedom of speech. Beijing is the world’s leader in controlling and censoring the internet, and the main exporter of internet blocking and censoring technology. Quite a dubious distinction.

In the context of puny or non-existent internet and electronic resources for speakers of non-European languages, the small non-official and often unrecognized languages of Central Europe do not appear so small or insignificant after all. What counts is not the number of speakers alone, but the economic and political clout a given speech community can muster for its disposal. While in Europe, it may make good economic and political sense to produce linguistically customized software and publications for a speech community of 20,000 or 50,000 people, the same is not true of millions-strong speech communities elsewhere. This striking inequality also fuels a high degree of multilingualism outside Europe, as speakers of the vast majority of non-European languages may gain access to secondary and university-level education solely through European – or formerly, colonial and imperial – languages. And to add insult to injury, over 80 per cent of the content on the internet is in European languages, English being responsible for the lion’s share of 55 per cent. But this multilingualism is radically unequal in its character, channeling the brightest and most active speakers of non-European languages toward monolingualism in a European language. This is one of the main features of today’s cultural and economic imperialism. Already fully Anglophone or Francophone children of professionally successful speakers of non-European languages rarely take care or even feel a need to acquire and develop their command of the languages of their grandparents, let alone to write and read in them.

Strangely, Europe’s largest stateless ethnolinguistic group, or the 12 million Roma (‘Gypsies’ is a largely derogative exonym, nowadays best avoided) have not managed to codify (let alone standardize) their language of Romani. The Romani Wikipedia is a makeshift affair, much smaller, less coherent and less dynamic than the aforementioned Wikipedias produced by and catering to speech communities that amount to a mere several tens or few hundreds of thousands of members. At a mere 550 articles, the Romani Wikipedia is one of the smallest and most neglected Wikipedias. Ranking as 241st, the Romani Wikipedia is almost equal in size with its immediate follower, or the Old Church Slavonic Wikipedia. The profound difference between these two is that nowadays no one speaks Old Church Slavonic as her native language. Per the unit of population, there are as few as 46 Wikipedia entries for one million Romani-speakers, or almost six times less than in the case of the same indicator for Hindi-speakers in India. This piece of statistical data speaks volumes on the marginalization of the Roma and their Romani language in Europe that likes to hail itself as the ‘cradle of democracy.’
October 2014, Dùn Dé / Dundee