4th March 2022

How Can Stylometry Identify New Genres? Metadata for the Evaluation of Stylometric Clusterings

José Calvo Tello (Göttingen State and University Library)

The majority of computational attempts to analyze genre has applied supervised techniques of Machine Learning. These require a closed and clear set of labels: a complete genre palette. Thus, the researcher is only able to inspect already given categories and not detect genres that have been overseen until now. What if there are subgenres that have not been detected yet? However, if the goal is to find a new category, how could that be evaluated with previous knowledge? To evaluate these clusters, I use the manually annotated literary metadata in the Corpus of Novels of the Spanish Silver Age (CoNSSA), which contains 358 novels of the Spanish literature (1880-1939). This collection includes qualitative information about the plot, protagonist, setting or narrator of each novel, obtained reading either the entire novel or summaries. These literary metadata can be understood as an intermediate step of data between the linguistic information and the genre labels and therefore support the identification of hypothetical genres, hidden until now.

