Arbitrariness in Language, Pt. 1

Pretty much since its inception, one of the core principles of linguistics has been that language is arbitrary (De Saussure, 1916; Hockett, 1960). That is, there’s no apparent relationship between a sign and what it signifies; nothing inherent about the word “dog” suggests that it must refer to the DOG concept[1].

Screen Shot 2017-04-23 at 9.11.18 PM

But this hasn’t stopped people from attempting to create languages that were more systematic. Arbitrariness was seen by some as a sign of the “degenerate” nature of language (Eco, 1993), and had deep religious implications; they believed that long ago, when mankind was first created, our language was not arbitrary at all, and this allowed a perfect communion with God. But after the Fall of Man (and subsequent Tower of Babel incident), the link between sign and signified was severed, and arbitrariness was introduced[2].

Others, such as John Wilkins, sought mathematical perfection in language. They wanted to construct a communication system in which the meaning of any word could be perfectly predicted by its form (Borges, 1988). For example, Wilkins divided the world into forty broad classes, which were “further divided into differences, which was then subdivided into species”. The syllable de denoted an element; adding a “b” (deb) denoted the element fire; and adding another “a” (deba) denoted a flame. One had only to memorize the meaning of any component syllable, as well as the standard algorithm for putting them together, to generate (and infer) the entire vocabulary.

Needless to say, all of these attempts failed (though they did yield fascinating studies in constructed languages). Something is just difficult, or perhaps plain impossible, about creating a perfectly iconic and logical language. There are cases of non-arbitrariness in language (e.g. onomatopoeia[3]), but for the most part, the meaning of a word has nothing to do with the word itself[4].

So with that said, as scientists, the question we must ask ourselves is: what specific advantage does arbitrariness offer?

The Arbitrariness Advantage

 Mike Gasser considered this question from the perspectives of language learning and language evolution (Gasser, 2004). Let’s say you’re trying to design a language, and you want it to fulfill two main objectives:

  1. The language should be easy to learn and remember.
  2. The language should be able to express all of the concepts (e.g. “meanings”) you need to express.

(Obviously, languages aren’t actually “designed” by any particular person; they change naturally over time, and these changes are driven by a variety of factors. But some researchers (Dingemase, 2015) have argued that chief among these factors are transmission biases. Namely, learnability and communicative utility, roughly corresponding to (1) and (2) above. In this way, just as biological organisms are subject to the pressures of natural selection, languages are subjected to these transmission biases, which serve to constrain languages to fit their communicative niche[5].)

Anyway, in terms of learnability, Gasser argues, it’s not obvious that arbitrariness wins out. If you have 20 concepts you want to express, it seems like it’d be easier if the ways to express them were aligned in some systematic way (e.g., they were iconic). Consider this from a mathematical perspective; if form reliably predicts meaning, all a language learner would have to memorize is the relationship between form and meaning (e.g. a linear correlation). But if form and meaning have nothing to do with each other, a language learner has to memorize each form-meaning pair separately.

Screen Shot 2017-04-15 at 4.21.04 PM
Figure 1: Taken from Gasser’s 2004 paper. Iconic languages have a perfect correlation between form and meaning; arbitrary languages are, well, arbitrary.

So where does arbitrariness come in? Well, as Gasser points out, as the size of a language increases, it becomes more probable that two form-meaning pairs will overlap. And with an iconic language – where particular meanings must always be paired with particular forms – there’s even less room or degrees of freedom. From an information-theoretic perspective, this is highly problematic, since two signals for two different things should be maximally distinct. Bühler (1990) summarizes the problem as follows: a language with only iconic words could never meet all our communicative needs, because “the possible form-meaning correspondences are more constrained for iconic words than for arbitrary ones” (Dingemase, 2015).

Screen Shot 2017-04-15 at 4.53.26 PM
Figure 2: Also taken from Gasser’s paper. Iconic languages, due to their perfect correlation, are more restricted, and have less degrees of freedom in the ways a meaning can be represented.

This suggests that as a language grows, arbitrariness actually beats iconicity in terms of communicative utility, and possibly in terms of learnability as well – after all, it’s hard to learn a language if you keep getting words confused, and iconicity can lead to increased confusion. Gasser demonstrates this theory with a computational model. As seen below, iconicity beats arbitrariness for small languages, but for larger languages (here, >100 items), arbitrariness becomes a useful tool.

Screen Shot 2017-04-15 at 4.27.37 PM
Figure 3: Also from Gasser’s paper. A small iconic language beats out a small arbitrary language in terms of learnability, but this effect reverses as the language size increases, and over time.

The Takeaway

Upon reflection, it becomes fairly obvious to many people that language is mostly arbitrary. The principle of arbitrariness is at the heart of Linguistics, and has even been termed a “design” feature of human language, setting it apart from other animal communication systems. But until more recently, it was never fully understood why language was so arbitrary. And some, like John Wilkins, thought that language would be better off if it was much more systematic.

But Gasser’s 2004 study, along with other work (Dingemase et al, 2015), suggests that arbitrariness performs a very useful function in language. Namely, as a language grows in size, arbitrariness gives speakers more freedom to refer to something with a variety of words, instead of being limited to something “inherently” linked to that concept.

A perfectly iconic language might be akin to a language composed entirely of onomatopoeia – and while such a language might be fun to speak, the concepts it could express (its communicative utility) would be limited to those expressible via iconic relationships.

Next time, we’ll discuss the problem of systematicity in language. That is, given that arbitrariness seems so useful, why are there pockets of iconicity and systematic relationships in language?

[1] One way we can intuitively verify this is by observing the number of very different words for DOG in different languages. If something about the form of a word predicts its meaning, those words should all be somewhat similar.

[2] Interestingly, it may actually be the case that early communication systems were less arbitrary than present-day languages. One hypothesis for the origin of language is that humans originally communicated primarily through iconic gestures, and that gradually the means of communication involved more and more vocal articulations in conjunction with these gestures, until eventually the vocal articulations carried the weight of meaning (thereby freeing up our hands to do other tasks). As we used language to describe more and more parts of the world, it was forced to become more arbitrary – the reason why will become more clear in the rest of this blog post. Of course, the “gesture-first” model of language development is just one hypothesis among many; it makes for a good story, and it’s more plausible than the Garden of Eden version, but it’s certainly not “proven” in any way.

[3] Here, you might point out that different languages often have different conventions for expressing onomatopoeia. This is certainly true, though all of these conventions still resemble the actual sound in some way; they just highlight a different aspect of the sound (often due to language-specific conventions).

[4] As the saying goes, “the word is not the thing”. (I believe this is attributed to Alfred Korzybski?)

[5] I’ve taken a few freedoms with the comparison here, but I don’t think it’s fundamentally incorrect. If you’re a language evolution scholar reading this and you disagree, please let me know, and I’ll update my analogy.


■Hockett, C. (2004). The origin of speech. Science (New York, N.Y.), 303(5662), 1316–1319.

■De Saussure, F., Baskin, W., & Meisel, P. (2011). Course in general linguistics. Columbia University Press.

■Dingemanse, M., Blasi, D. E., Lupyan, G., Christiansen, M. H., & Monaghan, P. (2015). Arbitrariness, Iconicity, and Systematicity in Language. Trends in Cognitive Sciences, 19(10), 603–615.

■Gasser, M. (2004). The origins of arbitrariness in language. Proceedings of the 26th Annual Conference of the Cognitive Science Society, 4–7. Retrieved from

■Borges, J. L. (1988). The Analytical Language of John Wilkins. Other Inquisitions 1937-1952, (1910), 101–105.

■Bühler, K. (1990) Theory of Language: The Representational Function of Language, John Benjamins.

■Eco, U. (1995). The search for the perfect language. Wiley-Blackwell.


2 thoughts on “Arbitrariness in Language, Pt. 1

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s