Why Distant Reading Works – Literary Mathematics

NLH Special Issue: Culture, Theory, Data

Very excited to see this essay finally published. (Open access version here.) It tackles a question I pass through too quickly in the book: Why does distant reading work?

The question is a strange one, and willfully contrarian. It’s in the nature of humanists — and of English professors in particular and perhaps of people in general — to emphasize the negative, to worry about everything that can and most likely will go wrong. But what has always fascinated me about quantitative approaches to language is simply the fact that they work at all. Like… how? It just seems so counterintuitive that there could be any meaningful relationship between a bunch of word counts, on the one hand, and anything we might call “meaning” or “the past” on the other.

I remember, as a graduate student, first learning about authorship studies that used statistics to prove who wrote anonymous works. I remember feeling scandalized. We had been reading Derrida and other literary theorists. I had studied rhetoric in my masters program, and I took it for granted that texts were a kind of front, a show, a frame, determined as much or more by genre and audience than by anything else. I took it for granted that authors could play with language and pose as each other effortlessly. It’s not so much that people use language, but that language produces the conditions of subjectivity. (Or so I had been admonished in a marginal note on one of my seminar papers during my first semester in grad school.) The notion that simply counting words could tell you the author — like some kind of subconscious linguistic fingerprint — seemed to fly in the face of everything I’d been taught to consider thoughtful, sophisticated, and knowledgable opinion about how texts and language exist in the world.

Sitting in the seminar room, surrounded by peers, I remember my blood pressure rising and the feeling as if I were blushing. “OK, fine. But does this actually work? How? Why?” I asked the class. No one responded for a solid beat. No one could explain how or why. The conversation picked back up. It just kind of does, even though it shouldn’t, and it’s weird that it does… hmm … let’s think about how this informs our reading of Barthes… After a few awkward moments, the conversation moved on.

There was something really important about language that I didn’t know and that none of my professors could teach me, because they didn’t know it either.

Years later when I started doing the work myself, I was amazed at how easy it all was. Everything just worked. Words just fell into place. Moby Dick is about whales. Pride and Prejudice is about Elizabeth and Darcy. Newspaper articles are about whatever has been happening that reporters thought was worth reporting. Sermons are about God and sin and stuff like that. Recipes are about ingredients. Wikipedia articles that use the word shenandoah are almost exclusively about places near the Shenandoah Valley. Newspaper stories about Hillary Clinton usually mention emails. Tweets are about all kinds of stuff, but tweets by any one person or group of people are different from tweets by others, always. Taking all this in aggregate and using a few simple statistics, it’s easy to draw highly detailed and highly accurate surveys of historical discourse. The method is extraordinarily powerful in ways that literary critics tend to miss because they get caught up in their own discipline-specific concerns and questions. — Distant reading just works.

That’s not to suggest, of course, that the scholarship itself is always successful. Carpentry “works” too, but my kitchen cabinets don’t close tightly, all the same. (Not a carpenter, this guy.) There are any number of things that can go wrong with distant reading, and there’s no shortage of commentary meant to caution you about those things. But I’d never read or heard of any commentary that even tried to answer my question.

And so, as I did to my peers in that graduate seminar so many years ago, I ask readers the same question: “Why does any of this work at all?”

For more, see Michael Gavin, “Why Distant Reading Works,” New Literary History 53-54, no. 4, 1 (Autumn 2022 / Winter 2023): 613-633. DOI:10.1353/nlh.2022.a898323

Pre-print manuscript version of the article can be found on Humanities Commons, here: https://hcommons.org/deposits/item/hc:56213/