[Tutorial] Document classification with Latent Semantic Analysis (LSA)

This post is a departure from my usual post format. Instead of walking through a theoretical topic or recent academic paper, this is intended to be a soft introduction to using Latent Semantic Analysis (LSA) to categorize documents. It's essentially an extension to the existing tutorial in sklearn, found here. I'll be using nlp_utilities for the walkthrough.

Discovering phonaesthemes (Arbitrariness in language, pt. 4)

Although the relationship between sound and meaning in language is mostly arbitrary, there exist pockets of so-called systematicity: clusters in which particular forms recur with particular meanings. One example of systematicity is the existence of phonaesthemes. Phonaesthemes are recurring patterns of sound and meaning that occur below the morphemic level, which is traditionally considered the

The Rhythm of Conversation (pt. 2)

People take turns talking during conversation. As discussed previously, the timing of this turn-taking process is remarkably fast, and happens largely beyond our conscious awareness. This raises the obvious question: how do speakers manage to transition between turns so quickly, and so successfully? Why Conversation is Remarkable Conversations, for the most part, do not follow

The Rhythm of Conversation (pt. 1)

Many things in our lives have rhythms: music, poetry, the pace at which we walk, and even the rate at which we talk. One of the marvels of everyday conversation – overlooked, perhaps, because it seems so obvious and so easy – is turn-taking. That is, when one speaker finishes talking, someone else usually starts

That’s not what I meant!

Ambiguity pervades language. This ambiguity can be used strategically by speakers, but it's also what makes language so challenging for machines to understand – and in some cases, it even leads to miscommunications between people, particularly over written communication. During in-person interactions, ambiguity is more easily avoided. If a speaker says of a recently released

What is ‘innateness’, anyway?

A recurring question in both scientific and public discourse is whether any given property of an organism is innate or learned. This debate, usually framed in terms of Nature vs. Nurture, often centers around properties of human behavior and cognition: intelligence, language, morality, mathematics, and so on. But while this dichotomous framing perhaps seems obvious