From Stylometry to Phylogeny: Towards a Large-Scale History of Genres

Oleg Sobchuk (Max Planck Institute for the Science of Human History)

Phylogenetic trees and networks are crucial for evolutionary biology, and increasingly important – for the research on cultural evolution: the evolution of languages, folklore, or music traditions. Now, can we use phylogenies on a rather different subject – modern literature? Imagine a “tree of literature”, with various thematic clusters, such as genres, as its branches. This talk presents a work in progress moving towards this goal.
How does one construct a phylogeny? In cultural evolution, the usual approach involves manual coding of certain traits of cultural items (e.g., presence or absence of particular events in a folktale) and subsequent measurement of similarity between the lists of traits. This approach would not work for modern literature: its larger size makes manual coding extremely laborious. Instead, one could measure the similarity between books algorithmically – borrowing insights and tools from computational stylometry. But which algorithms are best at capturing the thematic (not stylistic!) similarity between books? To determine this, we have run an experimental study; this talk will present its preliminary results.