This post is a departure from my usual post format. Instead of walking through a theoretical topic or recent academic paper, this is intended to be a soft introduction to using Latent Semantic Analysis (LSA) to categorize documents. It's essentially an extension to the existing tutorial in sklearn, found here. I'll be using nlp_utilities for the walkthrough. … Continue reading [Tutorial] Document classification with Latent Semantic Analysis (LSA)
Popular culture often depicts intelligent machines as coldly rational––capable of making “objective” decisions that humans can’t. More recently, however, there’s been increased attention to the presence of bias in supposedly objective systems, from image recognition to models of human language. Often, these biases instantiate actual human prejudices, as described in Cathy O’Neill’s Weapons of Math … Continue reading What we talk about when we talk about bias in A.I.
Ambiguity pervades language. This ambiguity can be used strategically by speakers, but it’s also what makes language so challenging for machines to understand – and in some cases, it even leads to miscommunications between people, particularly over written communication. During in-person interactions, ambiguity is more easily avoided. If a speaker says of a recently released … Continue reading That’s not what I meant!
A recurring question in both scientific and public discourse is whether any given property of an organism is innate or learned. This debate, usually framed in terms of Nature vs. Nurture, often centers around properties of human behavior and cognition: intelligence, language, morality, mathematics, and so on. But while this dichotomous framing perhaps seems obvious … Continue reading What is ‘innateness’, anyway?
Language is full of ambiguity. This fuzziness is often cited as a sign of imperfection, leading some to try to develop more precise languages of their own. But ambiguity actually serves a purpose, and is frequently exploited in human interactions. Part of recognizing the utility of ambiguity requires understanding that language is more than just … Continue reading “You Got Heat?”: Indirect speech acts in The Wire
Anyone who’s ever used Siri has likely experienced the frustration of not being understood. There’s a fundamental – almost existential – panic that surfaces when someone else doesn’t know what you’re saying. It’s an even bigger problem for people with accents other than “General American English”. Voice interfaces struggle with accents, from regional American accents … Continue reading Accents and Speech Recognition
Pretty much since its inception, one of the core principles of linguistics has been that language is arbitrary (De Saussure, 1916; Hockett, 1960). That is, there’s no apparent relationship between a sign and what it signifies; nothing inherent about the word “dog” suggests that it must refer to the DOG concept. But this hasn’t stopped … Continue reading Arbitrariness in Language, Pt. 1