What we talk about when we talk about bias in A.I.

Popular culture often depicts intelligent machines as coldly rational––capable of making “objective” decisions that humans can’t. More recently, however, there’s been increased attention to the presence of bias in supposedly objective systems, from image recognition to models of human language. Often, these biases instantiate actual human prejudices, as described in Cathy O’Neill’s Weapons of Math Destruction; for example, statistical models engineered to predict recidivism rates include information that would never be allowed in a courtroom, and perpetuate cross-generational cycles of incarceration.

In the midst of this media attention, many people are justifiably angry, and demanding solutions. But successfully addressing these issues requires understanding what exactly we mean when we use the term “bias”, as well as how these biases arise.

Talking about bias

I’d argue that there are at least two broad buckets that these examples fall into––and understanding which bucket we’re dealing with is important for building a solution.

The first kind of bias is a form of selection bias: machine learning systems are trained on unbalanced datasets, e.g. data that isn’t truly representative of the population it’s meant to represent. To give a concrete example that I’ve written about before, many language interfaces (such as Amazon Echo, Siri, etc.) have a tough time understanding foreign-accented speech, or even native English speakers that fall outside the realm of “General American English”, or GAE (e.g. African-American Vernacular English, or AAVE). This is not because these accents are more difficult to understand. Most speech recognition systems learn through example––e.g. the Wall Street Journal corpus––and unfortunately, most of the examples that they see come from speakers of GAE. This means that the system’s speech recognition ability will be limited to certain kinds of speakers, excluding many from using the product successfully.

Another kind of bias is what I’ll call human-generated bias: these are cases in which human prejudices, whether implicit or explicit, manifest in the datasets that are used to train machine learning systems. For example, humans sometimes use language in ways that perpetuates subtle implicit biases––because of this, a system trained on naturally-occurring examples of human language (e.g. news articles) will acquire similar biases, such as associating female names with female-dominated occupations (Garg et al, 2018). (Interestingly, another recent paper found that these biases are significantly correlated with the effect sizes from a well-known experimental measure called the Implicit Association Test––suggesting not only that machines learn bias from human language use, but also raising the possibility that humans can learn these biases purely through exposure to language.)

You might be wondering why I bothered to distinguish between these kinds of bias. While there are probably more category distinctions that could be drawn, there are well-documented examples from both categories, and in my opinion, they demand slightly different solutions.

A path towards unlearning bias

In the case of selection bias, the problem could be immediately addressed by diversifying the training examples that these machine learning systems are exposed to. This requires a concerted effort on the part of researcher to collect data from a wider distribution of populations (e.g. speech data from speakers with accents other than “General American English”). And, of course, these efforts should be tempered with appropriate care, as some of the populations that have been neglected by previous research are also historically disenfranchised groups––two facts which, I believe, are not entirely uncorrelated. More broadly, increasing diversity within the research community could help draw attention to these problems before they occur at all; simply having other voices in the room is a major step towards ensuring that all voices can be heard, so to speak.

The case of human-generated bias is trickier because a real solution probably requires addressing the problem of bias in humans themselves. Humans are skilled at learning associations, and a bias is ultimately just a harmful, often culturally-propagated, association. This strikes me as less of a data science or machine learning problem, and more of a social problem: the best way to avoid human-generated biases in systems that learn from humans is to eradicate those biases in ourselves––and this, of course, is unfortunately not a new problem, and likely not one that technology can solve. That said, there are researchers working on ways to de-bias machine learning systems themselves. The methodology discussed in this paper (Bolukbasi et al, 2016) attempts to first identify the presence of words with a gender bias (e.g. words which should be gender-neutral, like nurse, are closer in vector-space to words like woman than words like man), then neutralize this bias by modifying the vector of the offending word (e.g. nurse) so as to remove the biased gender association, all while trying to preserve its meaning along other dimensions (e.g. its relationship to other words, like patient). The point is simply that there are ways to address the biases that a system learns; of course, it is difficult to systematically neutralize the bias of a word while preserving its meaning in 300-dimensional space, and without accidentally introducing unintended biases to the system, but it’s a start.

The Takeaway

In order to combat bias in machine learning systems, we need to understand how and when it arises. It’s worth pointing out that the presence of this bias is (as far as I know) not intentional on the part of a system’s designers––some of the public discourse I’ve noticed reflects confusion around this notion of intentionality––but that also doesn’t mean we shouldn’t hold the designers, and ultimately ourselves, accountable. After all, we are the ones responsible for the creation of the biased datasets in the first place. The bias in these machine learning systems is a reflection of our own. It’s an unpleasant fact to consider, but a biased machine learning system is simply a mirror, held in such a way as to make our prejudices clear.

Something I haven’t addressed in this post, but which is perhaps even more important, is the false objectivity that people tend to ascribe to statistical models. For a very thorough account of this problem as it pertains to the perpetuation of human prejudice, I recommend Weapons of Math Destruction by Cathy O’Neil.


Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems (pp. 4349-4357).

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.=

Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16), E3635-E3644.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s