Multimodal Automated Fact-Checking: A Survey

Speaker: Mubashara Akhtar, King's College London

Abstract: Misinformation is often conveyed in multiple modalities, e.g. a miscaptioned image or a manipulated video. Multimodal misinformation is perceived as more credible by humans, and spreads faster than its text-only counterparts. While an increasing body of research investigates automated fact-checking (AFC), previous surveys on automated fact-checking mostly focus on text. In this survey, we conceptualise a framework for AFC including subtasks unique to multimodal misinformation. Furthermore, we discuss related terms used in different communities and map them to our framework. We focus on four modalities prevalent in real-world fact-checking: text, image, audio, and video. We survey benchmarks and models, and discuss limitations and promising directions for future research.

Bio: Mubashara is a final-year PhD student at King's College London and a student researcher at Google Deepmind. Previously, she interned at Google Deepmind in Zurich. Prior to that she was doing a research visit with Andreas Vlachos at the Cambridge NLP Group. She is also part of the organisation committee of the FEVER workshop and co-leading the responsible AI committee of Croissant, a MLCommons project on ML dataset documentation. Her research is on knowledge-grounded NLP, focusing on reasoning over text, tables, and images as modalities for automated fact-checking and question-answering. She is also interested in related problems such as numerical reasoning with large language models (LLMs).