Stylometry and Versification

Petr Plecháč
Institute of Czech Literature, Czech Academy of Sciences / Institute of Czech National Corpus, Charles University in Prague

Contemporary stylometry has developed extremely accurate and sophisticated methods of authorship attribution. The logic behind them is to tell the author by measuring the degree of stylistic similarity between the text in question and particular texts written by candidate authors. Various style markers are being taken into account for this purpose: frequencies of words, frequencies of parts-of-speech, frequencies of character n-grams, frequencies of collocations… One important aspect of style (of one important form of literature) however seems to be completely disregarded – versification.

The talk will present the ongoing project focusing on whether characeristics such as frequencies of stress patterns, frequencies of rhyme types etc. may be useful in the process of authorship attribution. Some pilot experiments comparing various classification methods (Delta family, SVM, Random forest) and their evaluation with Czech, German, Spanish, and English poetry will be presented.