Into the prosodic dimension: Finding meaning in the non-lexical aspects of speech

Wednesday 24 April, 5:30pm

All Souls College, Hovenden Room, and Zoom

Speaker: Dr Catherine Lai, University of Edinburgh

Abstract: With recent advances in machine learning, automated dialogue systems have become more able to produce coherent language-based interactions. However, most work on automated spoken language understanding uses still only text transcriptions, i.e., just the lexical content of speech. This ignores the fact that the way we speak can change how our words are interpreted. In particular, speech prosody --e.g. pitch, energy, and timing characteristics of speech -- can be used to signal speaker intent in spoken dialogues. In fact, prosodic features can help automatic detection of both dialogue structure and speaker affect/states. In this talk, I will discuss recent work on how we can combine non-lexical and lexical aspects to speech to improve speech understanding and generation and how new approaches to self-supervised learning from speech might be able to help us make the most of the true richness of speech.

Bio: Dr Catherine Lai a Lecturer in Speech and Language Technology, based in the Centre for Speech Technology Research at the University of Edinburgh. Her research focuses on speech prosody and how varying the way we speak can change our understanding of dialogue from both a recognition and generation perspective. She works on this from a speech technology/machine learning perspective, as well as a linguistic perspective, drawing on work in semantics, pragmatics and sociolinguistics.