FraunhoferSIT at GermEval 2019: Can Machines Distinguish Between Offensive Language and Hate Speech? Towards a Fine-Grained Classification

AutorVogel, Inna; Regev, Roey
ArtConference Paper, Electronic Publication
AbstraktIn this paper, we describe the Fraunhofer-SIT submission for the “GermEval 2019 – Shared Task on the Identification of Offensive Language”. We participated in two subtasks: task 1 is a binary classification of German tweets on the identification of offensive language. Task 2 is a fine-grained classification to distinguish between three subcategories of offensive language. Our best model is an SVM classifier based on tfidf character n-gram features. Our submitted runs in the shared task are: Fraunhofer-SIT coarse [1-3].txt for task 1 and FraunhoferSIT fine [1-3].txt for task 2. Our final system reaches 0.70 macro-average F1-score for the binary classification and 0.46 F1-score for the fine-grained classification. The achieved results show that the problem of automatically distinguishing between offensive language and “Hate Speech” is far from being solved.
KonferenzConference on Natural Language Processing (KONVENS) <15, 2019, Erlangen>
ReferenzGesellschaft für Sprachtechnologie & Computerlinguistik -GSCL-: 15th Conference on Natural Language Processing, KONVENS 2019. Proceedings. Online resource: October 9-11, 2019, Erlangen. Erlangen, 2019, pp. 377-381