Discovering systematicity (Arbitrariness in language, pt. 3)

Language is mostly arbitrary, but there are patterns of systematicity both within and across languages. As discussed previously, arbitrariness and systematicity seem to play unique roles in improving both the learnability and communicative utility of a language.

So how can we, as researchers, quantify the degree of arbitrariness and systematicity in a language? And how can we discover these trends automatically?

Quantifying arbitrariness and systematicity

What would it mean to quantify arbitrariness in a language? Recall that arbitrariness refers to the principle of language whereby the form of a word has no relationship to its meaning; that is, given some wordform, it’s impossible to reliably predict its meaning.

But sometimes the form of a word does give an indication of its meaning. Even disregarding morphological systematicity[1], we see indicators of sub-morphemic (below the level of morphemes) meaning in language, often statistical in nature. A well-documented case of sub-morphemic meaning is phonaesthemes, in which certain sounds tend to be associated with words of particular meanings (see Figure 1 below).

Screen Shot 2017-06-17 at 7.04.10 PM.png
Figure 1: Certain word onsets appear in words associated with particular meanings.

If we want to quantify the amount of arbitrariness or systematicity in a language, we need a good way to operationalize what it means for a word to be arbitrary. One operationalization is to say:

A word is arbitrary if its meaning cannot be reliably predicted from its form. A word is systematic if its meaning can be more reliably predicted from its form.

Comparing word forms

The forms of words are relatively easy to compare; one way to do this is by using the Levenshtein Distance (LD).

LD measures how similar two words are by how many “edits” (deletions, insertions, or substitutions) need to be made to one word to produce the other word. For example, LD(test, test) is 0, since the words are the same.

But LD(test, tent) is 1, since the “s” in test must be replaced by an “n”. Obviously Levenshtein distance doesn’t capture whether two words are similar in meaning (“dog” is more similar in meaning to “canine” than it is to “bog”), but that’s not the point. We just need a good way to measure how similar the forms of two words are – and, recall, forms of words are supposed to be arbitrary.

Comparing word meanings

But what about meaning?

“Meaning” is a tricky concept. Linguists and philosophers have puzzled over the problem of defining the “meaning” of a word for a long time. The problem here is we need a good way to define meaning mathematically[2]. And ideally, we need a way to do this automatically – doing it by hand is both intractable (there are a lot of words!), and introduces human bias into the equation.

One way to do this is to operationalize the meaning of a word using the contexts in which it appears (e.g. which words it co-occurs with). This is based on the distributional hypothesis, which I think is nicely summarized by a quote by JR Firth (Firth, 1957):

“You shall know a word by the company it keeps”.

You might expect words with similar meanings to appear in similar contexts. This can be operationalized by representing each word as a vector counting the number of times it co-occurs with every other word in the language, such as:

Apple = {eat: 500, grow: 200, …}[3]

If words with similar meanings appear in similar contexts, their vectors should be similar, such as:

Orange = {eat: 300, grow: 250…}

Packages such as word2vec allow researchers to produce these vectors automatically. All sorts of cool things have been done with this representation of a word’s meaning, including analogies (see Figure 2 below), which simply involving adding and subtracting vectors to “fill in the blank” (e.g. man is to king as woman is to ___)[4].

Screen Shot 2017-06-17 at 7.28.33 PM.png
Figure 2: Taken from

Using vector representations allows you to compute the similarity in meaning between two words by comparing their distance in vector-space; similar words should be located near each other.

Putting it all together

Padraic Monaghan and others (Monaghan et al, 2014) developed a method to find which words in a language are the most arbitrary. Specifically, they looked at the correlation between the forms of words and their meanings (using a vector representation of each word[5]):

Similarity(w1, w2) ~ LD(w1, w2)

By comparing every pair of words this way, you can generate a global correlation coefficient for a language denoting its arbitrariness; let’s call this rglobal. If rglobal is very close to 0, that suggests the language is very arbitrary; that is, there’s almost no correlation between the forms of words in the language and their meanings. But the further rglobal is from 0, the more a language’s word-forms and word-meanings are correlated.

Theoretically, this method could be used to compare the arbitrariness of different languages[6], but Monaghan et al (2014) were interested in looking at how arbitrary or systematic individual words were. To do this, they used a technique called leave-one-out regression. Basically, they ran the correlation above (meaning ~ form) a bunch of times, leaving out information about each word in the language on each iteration.

For example, on the first iteration, they might omit “apple” from the data, and thus produce rno_apple. On the second iteration, they might omit “orange” from the data, and thus produce rno_orange. They can then compare these values to rglobal. If, say, rno_apple is higher than rglobal, that suggests that “apple” is arbitrary; that is, removing “apple” from the data resulted in a more positive or systematic correlation between form and meaning. If, say, rno_orange is lower than rglobal, that suggests that “orange” is systematic; that is, removing “orange” from the data resulted in a weaker correlation between form and meaning.

Using this method, Monaghan et al (2014) produced values for each word in English denoting how systematic or arbitrary it was. This allowed the researchers to look for patterns in the data – what sort of words are more arbitrary, and which are more systematic?

One particular interesting finding was an inverse correlation between the systematicity of a word and its age of acquisition (see figure below). That is, words with a stronger relationship between their form and meaning tend to be learned by younger children; as the average age at which a word is learned increases, the word tends to be more arbitrary.

Screen Shot 2017-06-19 at 3.22.26 PM.png
Figure 3: Taken from Monaghan et al (2014). More systematic words are learned earlier in life.

The Takeaway

So what’s the point?

For me, there are two interesting and important takeaways from the paper. First of all, Monaghan et al (2014) describe an automated method for discovering the degree of systematicity in a language, as well as characterizing the systematicity of individual words. This is powerful, because it allows linguists to look for patterns in language that they couldn’t before.

Second – and perhaps of more interest to a general audience – they found that the more “arbitrary” a word is (the more unrelated its form and meaning are), the later on it is learned in life. This connects with Gasser’s earlier finding (Gasser, 2004), discussed in a previous post, that iconicity is more beneficial in small languages, while arbitrariness is more beneficial in larger languages[7].

This is evidence that language learners (e.g. children) manage to exploit both arbitrariness and systematicity, and that, as suggested by some (Dingemase et al, 2015), both features are important for shaping a language’s learnability.

Link to paper.


Firth, J. R. (1957). Papers in Linguistics, 0xford: Oxford University Press.

Monaghan, P., Shillcock, R. C., Christiansen, M. H., & Kirby, S. (2014). How arbitrary is language? Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 369(1651), 20130299-.

Dautriche, I., Mahowald, K., Gibson, E., & Piantadosi, S. T. (2016). Wordform similarity increases with semantic similarity: An analysis of 100 languages. Cognitive Science, 1–21.

Gasser, M. (2004). The origins of arbitrariness in language. Proceedings of the 26th Annual Conference of the Cognitive Science Society, 4–7. Retrieved from

Dingemanse, M., Blasi, D. E., Lupyan, G., Christiansen, M. H., & Monaghan, P. (2015). Arbitrariness, Iconicity, and Systematicity in Language. Trends in Cognitive Sciences, 19(10), 603–615.


[1] In English, suffixes often tell you something about both the grammatical category of a word, and its meaning (especially if you know the root word). For example, adding –ify to a noun or adjective, X, generally produces a verb meaning “to make something X-like” (e.g. “humidify”); adding the suffix –ful to a verb generally produces an adjective with a related meaning (e.g. “useful”, “forgetful”). These suffixes are highly regular and are also productive, meaning you can use them to produce new words on the fly, and listeners will generally catch the drift – even if the new word isn’t a canonically accepted English word (e.g. “blue-ify”). The reason I’m disregarding morphological systematicity is that morphemes (such as suffixes) are already well-accepted as markers for meaning. And so even if we accept that –ify is a meaningful morpheme, there’s no reason why that particular suffix would have that particular meaning.

[2] Philosophers in the tradition of logical semantics and analytic philosophy have been interested in this problem for a long time. One of the main approaches to meaning was to describe nouns as denoting referents in the world, and sentences denoting some sort of truth value about those referents.

[3] Note that these numbers are entirely fabricated for the purpose of this example.

[4] A lot more could, and would, be said about word2vec. Technically, word2vec isn’t count-based, but rather uses an unsupervised-learning method to learn the vectors for each word, based on its ability to predict a word given the context (or the context, given a word).

[5] Monaghan et al (2014) also used a semantic representation from WordNet, but I won’t get into that here.

[6] And something similar has been used, in fact, in Dautriche et al (2016).

[7] As measured by a neural network’s ability to learn artificial languages of those structures.

One thought on “Discovering systematicity (Arbitrariness in language, pt. 3)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s