Deepmime: Gesture Similarity Retrieval in Large Video Collections

Ms Mahnaz PARIAN-SCHERB (PhD candidate at University of Basel, Switzerland, University of Mons, Belgium) will be presenting in the final IMCC seminar of MT20.

Abstract: Analyzing and understanding gestures plays a key role in our understanding of communication. Investigating the co-occurrence of gestures and speech is currently a labor-intensive task in linguistics where computer vision methods can be of help. In real-world datasets, specifically in news footage and talk shows, videos are often multi-person and multi-angle, which pose significant challenges for computer vision methods for gesture recognition and retrieval.

In this talk, I will introduce project Deepmime, which is a gesture recognition and retrieval system, developed to improve the reliability and efficiency of the search system in human specific videos. The core of the system is based on deep learning methods to extract spatio-temporal features to represent hand gestures in a compact form to be used in similarity search. Additionally, I will explain concepts behind the gesture recognition and retrieval and the computer vision techniques applied in this field. I will also discuss the deep learning methods used in Deepmime, such as spatio-temporal feature extraction, attention aware gesture recognition, person re-identification and their role in improving gesture retrieval and reducing the background noise and occlusion in multi-person and multi-angle scenarios.