[PhD defence] 24/10/2024 - Lucas Druart: "Towards a Contextual and Structured Understanding of Task-Oriented Dialogic Speech" (UPR LIA)

Research news 16 October 2024

Mr Lucas DRUART will publicly defend his thesis entitled: "Towards a Contextual and Structured Understanding of Task-Oriented Dialogic Speech" on Thursday 24 October 2024. at 3pm.

Date and place

Oral defense scheduled on Thursday 24 October 2024 at 15:00
Venue: 74 Rue Louis Pasteur, 84029 Avignon
Thesis room

Discipline

Computer Science

Laboratory

UPR 4128 LIA - Avignon Computing Laboratory

Composition of the jury

MR YANNICK ESTEVE Avignon University Thesis supervisor
Mr Valentin VIELZEUF Orange Thesis co-supervisor
Frédéric BéCHET Aix Marseille University Examiner
Ms Dilek HAKKANI-TüR University of Illinois Urbana-Champaign  Examiner
Mr François PORTET Grenoble Alpes University Rapporteur
Sophie ROSSET Interdisciplinary Digital Sciences Laboratory Rapporteur
Mr Renato DE MORI McGill University Guest

Summary

Precise understanding of user requests is essential to ensure smooth interaction with Task Oriented Dialog (TOD) systems. Traditionally, these systems adopt cascading approaches that combine Automatic Speech Recognition (ASR) with Natural Language Comprehension (NLC). However, these systems still have difficulty in correctly associating complex user requests with their internal representations. Recent work has highlighted the potential for improving these systems. On the one hand, end-to-end approaches have made it possible to improve the performance of Speech Comprehension (SC) systems. They provide more accurate and robust predictions by exploiting joint optimisation and paralinguistic information. On the other hand, textual datasets offer structured semantic representations. Indeed, such representations appear to be more suitable for representing complex user requests. This thesis explores these two directions for a contextual and structured understanding of task-oriented dialogic speech. We first conduct a preliminary study devoted to CP in the context of DOTs. We designed a cascade approach to perform Dialogue State Tracking (DST) spoken on MultiWOZ. Our approach ranked first in the Speech Aware Dialogue System Technology Challenge thanks to automatic transcript correction and data augmentation. Next, we proposed a new method for performing fully neural spoken SED for MultiWOZ and SpokenWOZ. Our approach merges a latent representation of the textual context with a latent representation of the last speech turns in order to condition the dialogue state decoder. Although it benefits from joint optimisation, particularly in purely audio contexts, it struggles to propagate dialogue context correctly. Finally, in response to the difference in semantic representations between textual and spoken DOT datasets, we introduced the ReMEDIATES dataset. This was constructed by semi-automatically augmenting the MEDIA dataset with semantic trees. The associated benchmark makes it possible to evaluate semantic analysis models of spoken dialogues with contextual and structured representations, which opens up prospects for future challenges.

Key words : speech understanding, task-oriented dialogue, neural end-to-end

Mots clés associés
thesis defence