New research project from ATHENE: Disinformation and Corona (DisCo)
Recognize fake news about the pandemic better
Disinformation and fake news are circulating around the world about the corona pandemic. Especially on social media platforms such as Facebook, Twitter and YouTube, people share news that have not been checked, so that it spreads rapidly. "We are not only fighting a pandemic, but also an info-demie," said Tedros Adhanom Ghebreyesus, WHO Director General. Researchers at Fraunhofer SIT want to meet this challenge in the DisCo project from the ATHENE research area Secure Digital Transformation in Health Care (SeDiTraH).
In doing so, they address aspects of "text forensics" and "multimedia forensics".
Investigations into text forensics
Fake news about the new coronavirus is currently spreading even faster than the virus itself. It is almost impossible to check the truth of all news manually. Large amounts of data are needed to train machine learning methods that are supposed to automatically recognise fake news. Text forensics experts from the project team are therefore creating a dataset with news texts that contains both fake and legitimate news. The corpus is to be used both for their own research purposes and for the research community for scientific purposes in the fight against false news.
In addition, the project team is developing a demonstrator that shows how the technology can help journalists, for example, to automatically recognise text passages worth checking and to highlight them in the text. It is often important to first get an overview of the topic of the article. For this reason, the tool should first summarise the news text. For this, a "Text Summarisation" algorithm is used, which extracts the most important or relevant information within the original content and then condenses it into a shorter version. In addition, the area of "Check-Worthiness" detection will be explored, i.e. machine learning techniques from the field of Natural Language Processing (NLP) that predict which sentences should be prioritised for fact checking. Consequently, the demonstrator should not only be able to summarise news texts, but also to highlight relevant statements in the text. The aim is to explore how technologies can help facilitate the process of detecting fake statements in news texts.
Investigations into multimedia forensics
The use of image and video material also plays a role in fake news in many examples: Often, older image material is taken out of its context and reused (unchanged) in the context of the Corona pandemic, e.g. through false information in the image caption or in the accompanying article. Occasionally, imagery is also altered by deliberate retouching within the image to appear to substantiate a false message or to influence readers.
A useful method to detect both types of influence is the so-called inverse image search: some of the well-known search engines (e.g. Google, Bing, Yandex, Baidu) and specialised providers (e.g. Tineye) make it possible to search the internet for other sources with the same images not only by using keywords as input, but also by using input images. The authenticity of a questionable image in a message can thus be checked by comparing it with alternative trustworthy sources of the same image that may be available. Retouched images or images taken out of their original context can thus be detected.
One aim of DisCo is to assess and compare the (re-)recognition rates of these image search engines. To this end, a corpus of typical sample images from Corona fake news is first created. A challenge for image search is that images reused in Fake News are additionally modified, e.g. by scaling, cropping, montage with other images or adding captions, emojis, etc.. Their influence on the (re-)recognition rates of the comparison algorithms will be investigated. In addition, useful pre-processing steps that can increase the recognition rate when users examine suspicious images will be derived.
Prof. Martin Steinebach
Tel.: +49 6151 869-349