It’s no secret that different languages are different. One particularly well-known dimension along which languages differ is in their morphological structure, which refers to how words are formed in a language.
Language is mostly arbitrary, but there are patterns of systematicity both within and across languages. As discussed previously, arbitrariness and systematicity seem to play unique roles in improving both the learnability and communicative utility of a language.
So how can we, as researchers, quantify the degree of arbitrariness and systematicity in a language? And how can we discover these trends automatically?
Previously, we established that arbitrariness is an essential part of language. It allows for greater communicative utility, and probably learnability as well – two of the main transmission biases that were hypothesized to affect the evolution of a language.
But then how do we account for the fact that there is non-arbitrariness in language?
Pretty much since its inception, one of the core principles of linguistics has been that language is arbitrary (De Saussure, 1916; Hockett, 1960). That is, there’s no apparent relationship between a sign and what it signifies; nothing inherent about the word “dog” suggests that it must refer to the DOG concept.