Bias is real – and often harmful. It’s been shown to manifest in hiring decisions, in the training of machine learning algorithms, and most recently, in language itself. Three computer scientists analyzed the co-occurrence patterns of words in naturally-occurring texts (obtained from Google News), and found that these patterns seem to reflect implicit human biases.
Language is mostly arbitrary, but there are patterns of systematicity both within and across languages. As discussed previously, arbitrariness and systematicity seem to play unique roles in improving both the learnability and communicative utility of a language.
So how can we, as researchers, quantify the degree of arbitrariness and systematicity in a language? And how can we discover these trends automatically?
Previously, we established that arbitrariness is an essential part of language. It allows for greater communicative utility, and probably learnability as well – two of the main transmission biases that were hypothesized to affect the evolution of a language.
But then how do we account for the fact that there is non-arbitrariness in language?
Human language is a strange phenomenon. Somehow, we’re able to convey complex ideas through a fuzzy communicative channel. Even disregarding the remarkable machinery involved in transforming sound waves into neural signals, how does meaning emerge from those signals? And how do we talk about abstract concepts like “Justice”, “Truth”, or even “Concepts” themselves?