(Note: This work was conducted with Robert Loughnan of the UCSD Cognitive Science Department.)
The role of the news media is ostensibly to inform. In order to do this, however, the media must present information in a relatively unbiased way. If citizens obtain information about the world primarily through the media, and the media presents this information through a biased lens, the public acquires an analogous bias. So how successful is the media in presenting an unbiased perspective?
One way in which bias might surface is in the language a news outlet uses to describe different topics, such as those with political affiliations. Such topics include presidential candidates (Trump vs. Clinton), party names (Democrat vs. Republican), or even politically charged policies (Obamacare, DACA, etc.). Different news outlets might discuss these topics using more positive or more negative language, thus demonstrating some form of bias.
At least one previous study suggests that citizens already assume that the media is biased, particularly television news (Kiousis, 2001). I suspect that a more recent poll, in this era of “Fake News”, would elicit even more polarizing opinions on the media. But unfortunately, surprisingly little work has investigated the actual presence of bias in the news. (For a visually-appealing exception, see here).
We attempted to address this gap by examining differences in the way that news outlets described politically-affiliated topics––specifically, how positively or negatively these topics were described.
Analyzing News Bias
We collected over 200k tweets from the Twitter feeds of 8 major news outlets over the past 4 years (Breitbart News, CBS, Fox News, NBC, NY Times, Washington Post, Vox, and Wall Street Journal). We then tagged each tweet as either liberal, conservative, or neutral, depending on the words appearing in the tweet (e.g. a tweet with “Trump” was tagged conservative; a tweet with “Clinton” was tagged liberal; a tweet with both was ignored, for now). Each of these tweets was then assigned a sentiment score using the Stanford CoreNLP Sentiment Analysis tool (Manning et al, 2014); for more information on the tool, see here [https://nlp.stanford.edu/sentiment/code.html], but the short version is that an algorithm processes the sentence and generates a probability distribution over the possible sentiment labels (ranging from Very Negative to Very Positive, here translated to a range of [-1, 1]).
The News is Negative
Our first finding, when looking at sentiment across outlets (regardless of political topic), was that the overwhelming majority of tweets were categorized as negative (E.g. between -1 and 0). This isn’t wholly surprising, but it does suggest the Sentiment Analysis tool is capturing what we’d intuitively expect.
The News is Biased (Slightly)
We then asked whether the political affiliation (e.g. liberal vs. conservative) of a tweet predicted its sentiment score, and whether outlets varied in the direction of this effect. If all outlets are unbiased, then tweets about liberal and conservative topics should be equivalently scored (whether negative or positive); if outlets are biased, then there might be 1) overall differences in which topics are more negative or positive; and 2) differences across outlets in this bias.
Among the outlets we surveyed, we found evidence for a slight liberal bias, except in Breitbart News and the Wall Street Journal.
The News Has Gotten More Biased
Finally, we asked whether the degree of bias has changed in the last four years, and whether outlets have changed in the direction of their bias. (Note that for this visualization and analysis, we had to omit both Vox and Breitbart News due to insufficient data.) Specifically, we investigated whether the degree of liberal bias (difference in sentiment about liberal vs. conservative tweets) changed over time, and by outlet.
We found that five of the remaining six outlets became steadily more liberally biased (e.g. were more positive / less negative about liberal topics than conservative topics), while one became slightly more conservatively biased.
By quantifying “bias” as the difference in sentiment in tweets about liberal vs. conservative topics, we found evidence for slight bias across the news outlets we surveyed. While this bias was significant in all analyses (p<.0001), the effect sizes were quite small in most cases, suggesting that if the bias is real, it’s marginal.
On the other hand, there are significant limitations to this analysis. First, as mentioned in the footnote, the method we used to tag tweets as liberal or conservative was very crude, conflating tweets about Trump with tweets about Jeb Bush and the GOP more generally. Thus, party-internal struggles (e.g. Trump vs. Cruz, Clinton vs. Sanders) might ultimately result in canceling out actual bias. In the future, we plan to implement a more topic-specific analysis.
Another limitation lies with the sentiment analysis tool itself. While the Stanford Core NLP software is one of the most sophisticated tools for sentiment analysis on the market, it fails to account for the complexity of: 1) Grammar; and 2) Context. For example, the sentence “Trump destroys the opposition” might be tagged as negative given the multiple negative words (destroys, opposition), when in fact the sentence might actually be a statement about Trump’s victory. Even more problematically, intuition suggests that humans might interpret the same headline differently when it’s posted by different sources: the sentence “Trump kills assault rifle ban” might be more positive or negative depending on the outlet it’s from. If this is true, it suggests readers interpret a headline not only on the basis of its content, but on assumptions about the bias of its source––which makes identifying bias in headlines a somewhat circular problem! Experimental work should test whether this is true, and attempt to quantify whether it poses a problem for sentiment analysis systems.
The larger question looming over this study, and all like it, is what should be done. If news outlets really are biased, should efforts be made to curb this bias, or does this violate the fundamental freedom of the press? Alternatively, should news outlets be assigned a “bias score” indicating their bias on any given topic? How would this score be agreed-upon and generated? I’m certainly not a policy-maker, so my suggestions are primarily in the domain of future research. To that end, I suggest: 1) Further analyses correcting for the limitations described above, to better quantify the existence of attitudinal bias in outlets; 2) Experimental work attempting to quantify the impact of this bias on the readers of an outlet.
Kiousis, S. (2001). Public trust or mistrust? Perceptions of media credibility in the information age. Mass Communication & Society, 4(4), 381-403.
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations (pp. 55-60).
 This annotation process, of course, was quite crude. Besides throwing out a whole lot of data (any tweets that don’t have the keywords), it conflates tweets about potentially opposed topics (e.g. tweets about both Trump and Jeb Bush would be tagged as conservative). A more fine-grained, topic-sensitive analysis would likely yield more accurate and interpretable results.