It’s no secret that different languages are different. One particularly well-known dimension along which languages differ is in their morphological structure, which refers to how words are formed in a language.
Some languages have pretty simple words, meaning there are relatively few morphemes, or units of meaning, per word. For an example in English, the phrase “the dog” consists of two words, each composed of only one morpheme.
But other languages (or other words in English) have words containing many more morphemes. Here, an example in English would be the word “dehumidification”, which contains at least four morphemes: de + humid + ify + cation. Some of these are root morphemes and can appear on their own (e.g. “humid”), while others can only be used to change the meaning or grammatical category of an existing root word (e.g. “-ify”).
As a linguist – or perhaps as a citizen of the world – one question that might arise is: why are languages so different?
Words on the move
No language is a static entity. Languages are constantly acquiring new words, removing old words from their vocabulary, inventing new expressions and repurposing old ones for new meanings, and more. And despite the eternal complaints of stodgy prescriptivists, the slang of today is the grammar of tomorrow.
One way to think about this change over time is as a form of evolution. Just as different species adapt and evolve to the demands of their environment, languages adapt and evolve to fit the needs of their speakers. Certain features of a language are preserved, while others are left behind.
Dingemase et al (2015) argue that the main pressures influencing language change are communicative utility and learnability. That is, roughly:
- Communicative utility: How effectively the language allows you to communicate what you want to communicate.
- Learnability: How easy the language is to learn.
Previously, I discussed some of the language factors that affect a language’s communicative utility and learnability. For example, languages with simpler morphological structure are probably easier to learn than languages with very complex morphological structure.
Importantly, however, these pressures might manifest in different ways across the world. Different language communities have different sorts of needs, meaning that (1) and (2) above will both be manifested and emphasized in different ways, according to the needs of the speakers.
This really isn’t so different from how we think about fitness in biological evolution. Broadly, fitness refers to the contribution of one organism’s genotype to the gene pool of the following generation. But fitness manifests in wildly different ways across different species, different habitats, and different biological communities; a feature that is highly advantageous in one species might be a disastrous mutation in another.
Now it seems like we’re in a position to better reframe our question from earlier: what – if any – are the variables that shape which features or communicative pressures are emphasized in a language? In particular, how do features of the language community predict features of the language?
A Niche for Every Language
The idea that social variables influence language change is called the linguistic niche hypothesis. One such social variable is the proportion of non-native speakers that speak a language, sometimes called the L2 population (Lupyan and Dale, 2010). The argument goes something like this:
- An important feature of a language is that it is learnable.
- Non-native speakers must learn a language later in life, which is generally agreed to be more difficult than learning a language in early childhood.
- Therefore: If social features, such as (2), can affect language features, such as (1), then as the number of non-native speakers increases, a language should adapt over time to become more learnable.
There are a few ways one could test this hypothesis, and they mostly depend on how one operationalizes the learnability of a language. Here, we’ll focus on learnability as a function of the morphological complexity of a language. That is: languages with simpler morphology should be easier to learn than languages with more difficult morphology.
In an analysis of 2,236 languages across the world, from a number of different language families, Lupyan and Dale (2010) found that languages with more non-native speakers (as estimated by the authors) tended to have simpler morphological systems. For example, Figure 1 below shows that as the number of non-native speakers increased, verbs tended to have fewer conjugations (e.g. in English, the past tense of “run” is “ran” for I/you/he/they/we, whereas Spanish would have different conjugations for each of those subjects):
This theory has been corroborated by a few other studies. Notably, Bentz and Winter (2013) looked at 226 languages across the world, and found that languages with more non-native speakers have less nominal cases. A nominal case is a marker specifying that a noun is the subject of a verb – some languages have over 20 nominal cases, and some have none at all. Crucially, Bentz and Winter (2013) found that a key variable predicting the number of nominal cases was the size of an L2 population.
The theory is also supported by analytical work from John McWhorter (2001). McWhorter argued that creole languages – e.g. languages that arise from pidgins (a simplified grammar that combines elements from multiple languages) – are systematically less grammatically complex than older grammars. McWhorter also argues that, left alone, languages naturally tend towards complexity and irregularity (2007); thus, the simplification of a language is promoted by the number of adult learners, e.g. non-native speakers.
Of course, it’s difficult to determine the direction of causality. It’s possible, for example that language complexity affects the number of non-native speakers; that is, non-native speakers orient towards learning languages that are easier to learn. But as Bentz and Winter (2013) point out, this “reverse hypothesis” is unlikely, since historically, speakers have not had a choice in which languages they learn – usually this “choice” is heavily restricted by imperialism, as well as economic and political factors. It’s still possible, however, that there’s a third, intervening variable, and we just haven’t figured that out yet.
Hopefully, this short essay has made a few things clear. First, languages are different across the world and across time. Second, just as species evolve to fit the needs of their environment, languages evolve to fit the needs of the communities in which they are spoken. And finally, one crucial variable that predicts language structure – in particular, language complexity – is the number of non-native speakers that speak that language.
So why should the average non-linguist care?
Well, pretty much everyone speaks at least one language, and over half the world’s population speaks two or more. But despite using language every day, we rarely think about why the language we speak is the way it is – and why other languages are different. The linguistic niche hypothesis sheds light on why these differences arise, and offers a convincing explanation for why certain languages are more difficult to learn than others.
Dingemanse, M., Blasi, D. E., Lupyan, G., Christiansen, M. H., & Monaghan, P. (2015). Arbitrariness, Iconicity, and Systematicity in Language. Trends in Cognitive Sciences, 19(10), 603–615. https://doi.org/10.1016/j.tics.2015.07.013
Lupyan, G., & Dale, R. (2010). Language structure is partly determined by social structure. PLoS ONE, 5(1). https://doi.org/10.1371/journal.pone.0008559
Bentz, C., & Winter, B. (2013). Languages with More Second Language Learners Tend to Lose Nominal Case. Language Dynamics and Change, 3(1), 1–27. https://doi.org/10.1163/22105832-13030105
McWhorter, J. (2001). The world’s simplest grammars are creole grammars. Linguistic typology, 5(2), 125-66.
McWhorter, J. (2007). Language interrupted: Signs of non-native acquisition in standard language grammars. Oxford University Press.
 Two other prominent dimensions are phonemic inventory and syntax. Phonemic inventory refers, roughly, to the sounds that speakers make to form words. Syntax refers to the way in which speakers order their words – for example, English has what’s called an SVO word-order, meaning Subject-Verb-Object (e.g. “John kicked the ball”).
 The question of what constitutes a “word” is a surprisingly deep and difficult one to answer thoroughly. The standard definition is that a “word” is something which can be uttered on its own to produce meaning; this is distinguished from a “morpheme”, which can be a word, but can also be a part of a word (e.g. the plural “-s”). Apart from the philosophical difficulties of pinning down precisely what “meaning” entails, we also face the difficulty of determining how it is that people learn what words are through spoken interactions with others, in which the gaps between syllables of separate words are not necessarily greater than syllables in the same word.
 This is a hot topic, research-wise, hence the probably. Early arguments regarding language complexity originated from fundamentally racist and imperialistic sentiments, i.e. the belief that languages with morphological systems corresponding to those widely used in European languages were “the best”. This sort of misguided thinking eventually petered out in most academic circles, but was unfortunately followed by a kind of blind insistence that all languages are “equally complex” or “equally difficult to learn”, despite languages being obviously very different. This is partly a PR problem; saying something is “complex” or “simple” has all sorts of unfortunate or unintended connotations, so people shied away from using the terms at all to avoid evoking these connotations. The important thing to realize, however, is that difference does not imply superiority or inferiority. Languages are simply different in many ways – that’s hardly a controversial point – and those differences don’t need to reflect differences in the intelligence or abilities of their speakers, but rather an incredibly complicated history involving the degree of contact with other language communities, the geographical regions in which the language is spoken, the size of the language community, and so on.
 Here, it’s important to note that the authors operationalized number of non-native speakers as a function of: 1) population measures; 2) geographic spread; and 3) degree of linguistic contact.
 Importantly, Bentz and Winter (2013) used langauges for which non-native speaker data was actually available – as opposed to Lupyan and Dale (2010).