Ambiguity pervades language. This ambiguity can be used strategically by speakers, but it’s also what makes language so challenging for machines to understand – and in some cases, it even leads to miscommunications between people, particularly over written communication.
A recurring question in both scientific and public discourse is whether any given property of an organism is innate or learned. This debate, usually framed in terms of Nature vs. Nurture, often centers around properties of human behavior and cognition: intelligence, language, morality, mathematics, and so on. But while this dichotomous framing perhaps seems obvious to us now, when did the question first arise? And is it really the best way to investigate these properties?
Bias is real – and often harmful. It’s been shown to manifest in hiring decisions, in the training of machine learning algorithms, and most recently, in language itself. Three computer scientists analyzed the co-occurrence patterns of words in naturally-occurring texts (obtained from Google News), and found that these patterns seem to reflect implicit human biases.
Language is mostly arbitrary, but there are patterns of systematicity both within and across languages. As discussed previously, arbitrariness and systematicity seem to play unique roles in improving both the learnability and communicative utility of a language.
So how can we, as researchers, quantify the degree of arbitrariness and systematicity in a language? And how can we discover these trends automatically?
Previously, we established that arbitrariness is an essential part of language. It allows for greater communicative utility, and probably learnability as well – two of the main transmission biases that were hypothesized to affect the evolution of a language.
But then how do we account for the fact that there is non-arbitrariness in language?