Gender is now recognized as an important social issue. Politicians, the media, and laypeople alike are discussing and debating topics like the gender wage gap, workplace sexual harassment, and institutionalized prejudice.
Another area where gender crops up is education. Teachers are constantly being evaluated, whether it’s by their supervisors or sites like RateMyProfessor.com. These evaluations are very important to a teacher’s livelihood, particularly for teachers trying to gain tenure. Of course, supervisors need some metric by which to evaluate teachers, and student evaluations seem, at first glance, like they could be a useful source.
But does gender play a hidden role in these evaluations?
Gender Skews Perspective
Researcher Ben Schmidt scraped ~14 million reviews from RateMyProfessor.com and analyzed which words were used most frequently to describe male vs. female teachers. An online tool for visualizing word frequencies by gender can be found here.
Below is a comparison of the word frequencies for “smart” vs. “sweet”. The x-axis represents the number of times that word was used, per millions of words of text:
More students described male teachers as “smart” than female teachers, and more students described female teachers as “sweet” than male teachers. A naysayer, however, might have at least two objections to concluding anything about gender bias from this:
- As Schmidt himself notes, this is not a peer-reviewed scientific study. There are various confounds he doesn’t control for, such as the gender of the reviewer, or whether particular reviewers are more likely to use particular words.
- Maybe – and this is a big maybe – the data doesn’t reveal a bias, but an underlying “truth” about teaching styles and ability.
If (2) were true, we’d expect the gender of a teacher to reliably predict how students evaluate them; that is, maybe female teachers really are “sweeter” (and male teachers are “smarter”). But if (2) is mistaken – and a bias is actually at play – we’d expect students’ evaluations to reflect their perception of a teacher’s gender, rather than the teacher’s actual gender.
Fortunately, this question is ripe for scientific inquiry. MacNeill et al (2014) investigated exactly this. Two instructors (one male, one female) taught an online course, each using two different names for the different conditions (one male, one female). In experimental terminology, this was a 2×2 design, with actual gender crossed with perceived gender.
The instructors taught the course as normal, and received ratings at the end. Before I give away the findings, recall the predictions from the two hypotheses:
- Gender difference hypothesis: If male teachers perform in a fundamentally different way than female teachers, actual gender should be the strongest predictor of student ratings.
- Gender bias hypothesis: If male teachers perform in a fundamentally equivalent way to female teachers, but biases about gender and ability skew students’ perspective, perceived gender should be the strongest predictor of student ratings.
The results are pretty clear. Instructors perceived as male received higher ratings on professionalism, promptness, fairness, and respectfulness. And crucially, actual gender was not predictive of a teacher’s ratings; in fact, the same instructor received different ratings in the different perceived gender conditions. In general, teachers perceived as female received lower ratings overall.
The findings in MacNeill et al (2014) suggest that biases about gender do in fact play a role in a teacher’s evaluations. Students seem to rate male and female teachers according to different qualities and standards.
Of course, this has broader implications for how we consider teacher evaluations. While it doesn’t mean that all male teachers are overrated and all female teachers are underrated, it does suggest that gender plays a significant role in influencing a teacher’s rating. In other words, gender plays a kind of distorting role, shifting a rating one way or the other (all else being equal).
So what can we do? Unfortunately, systemic biases are particularly difficult to address with top-down policy solutions, as they require a societal-level shift in perspective. On the other hand, simply being aware of the problem is part of the solution; if we know that female teachers tend to receive lower scores on certain qualities as a function of societal gender bias, we can view these scores with a grain of salt. And so perhaps the best action we can take on an individual level is to be aware, and make sure that others are aware – and, importantly, speak up when we notice or suspect this kind of gender bias coming into play.
(I should add that I was exposed to these studies in a class taught by Esther Walker last quarter at UCSD, for which I was a TA; there are also numerous studies about the role of gendered language in creating or perpetuating a gender bias, which I hope to address in future posts.)
MacNeill, L., Driscoll, A. & Hunt, A. (2014). What’s in a Name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291-303.
 The use of data-driven metrics is discussed at length in Cathy O’Neill’s book, Weapons of Math Destruction, in regards to the “value added” model of teacher evaluation. As with many of the other data-driven models she describes, the model has good intentions, but its misapplication led to many great teachers losing their jobs.
 The visualizations also include a breakdown by field. The ones I looked at didn’t appear to have a significant interaction between field and gender, but it could be interesting to explore further; one might expect certain heavily gendered fields (e.g. Computer Science) to show a stronger effect. Also, see Schmidt’s caveats on his post here (http://benschmidt.org/2015/02/06/rate-my-professor/); as he notes, this is by no means a peer-reviewed study. It’s simply an interesting exercise in data visualization, albeit one with some possibly illuminating results.