Humorous Caption Generation with Reinforcement Learning

Speaker: Shuyang (Kevin) Sun, (University of Oxford)

Abstract: Humour as a fundamental human expression, can be triggered by different kinds of modalities including visual stimuli and languages. In this project, we show that by exploiting the recent CLIP model for image-text interaction and a Language Model (LM) for text generation, our model can generate high-quality funny captions based on an image as the input. Since there is no metric for us to evaluate how funny the generated image-text pairs are, we also build a model to measure the funniness for them, and build a Reinforcement-Learning (RL) algorithm to guarantee that the generated image-text pairs are funny. We also show that our method can be also applied to understand the composition of humour by visualizing the intermediate features we obtained in our model.

About Shuyang (Kevin) Sun: Shuyang (Kevin) Sun is a DPhil (PhD) candidate at Torr Vision Group, University of Oxford, supervised by Professor Philip Torr and Professor Victor Prisacariu. He received his M.Phil. degree from the University of Sydney in 2019 and a B.Eng. degree in 2016 from Wuhan University. As part of his PhD study, he visited Google Research, Intel, and ByteDance as a research intern. Currently, his focus is on building a comprehensive visual system with a unified perception.