An Unorthodox Approach for Style Change Detection

AuthorGraner, Lukas; Ranly, Paul
TypeConference Paper
AbstractThe PAN shared tasks include tasks from a variety of disciplines, including authorship analysis, and are held annually. Participants are able to compete with each other by proposing approaches for a task, which are then compared and evaluated on a test dataset with predefined performance metrics. So far, the test datasets have traditionally been withheld, so that participants may only train and optimize approaches on the training and validation sets. In this year’s Style Change Detection task, the objective of which is to locate author changes in multi-author text documents, PAN has also published the test set built from publicly available Q&A platform posts, albeit without ground truth labels. In this paper, we show that the ground truth of the test set can be recovered almost entirely by querying search engines with paragraph excerpts from the test set, crawling the query results and parsing author information of corresponding posts. We point out that this allows others to secretly tailor their approaches to the recovered test labels and thus gain an unfair advantage. Furthermore, as part of an in-depth data analysis, we address a variety of issues and finally suggest improvements for future Style Change Detection tasks.
ConferenceConference and Labs of the Evaluation Forum 2022