|Terhörst, Philipp; Kolf, Jan Niklas; Huber, Marco; Kirchbuchner, Florian; Damer, Naser; Morales, Aythami; Fierrez, Julian; Kuijper, Arjan
|Face recognition (FR) systems have a growing effect on critical decision-making processes. Recent works have shown that FR solutions show strong performance differences based on the user's demographics. However, to enable a trustworthy FR technology, it is essential to know the influence of an extended range of facial attributes on FR beyond demographics. Therefore, in this work, we analyse FR bias over a wide range of attributes. We investigate the influence of 47 attributes on the verification performance of two popular FR models. The experiments were performed on the publicly available MAAD-Face attribute database with over 120M high-quality attribute annotations. To prevent misleading statements about biased performances, we introduced control group based validity values to decide if unbalanced test data causes the performance differences. The results demonstrate that also many non-demographic attributes strongly affect the recognition performance, such as accessories, hair-styles and colors, face shapes, or facial anomalies. The observations of this work show the strong need for further advances in making FR system more robust, explainable, and fair. Moreover, our findings might help to a better understanding of how FR networks work, to enhance the robustness of these networks, and to develop more generalized bias-mitigating face recognition solutions.