[PhD defence] 9/09/2024 - Gaëlle Laperrière: "Understanding speech in a multilingual context" (UPR LIA)
Gaëlle Laperrière will defend her thesis on 9 September 2024 on the theme: "Understanding speech in a multilingual context"..
Date and place
Oral defense scheduled on Monday 09 September 2024 at 15:00
Venue: 339 Chemin des Meinajaries, Centre d'Enseignement et de Recherche en Informatique, 84911, Avignon
Room: Ada Lovelace Amphitheatre
Discipline
Computer Science
Laboratory
UPR 4128 LIA - Avignon Computing Laboratory
Composition of the jury
Mr Yannick ESTÈVE | Avignon University | Thesis supervisor |
Mr Benoit FAVRE | Aix-Marseille University | Rapporteur |
Mr Alexandre ALLAUZEN | University of Paris Dauphine-PSL | Rapporteur |
Mr Fabrice LEFÈVRE | Avignon University | Examiner |
Mr Marco DINARELLI | National Centre for Scientific Research | Examiner |
Ms Nathalie CAMELIN | Le Mans University | Examiner |
Mr Philippe LANGLAIS | University of Montreal | Examiner |
Ms Sahar GHANNAY | Paris-Saclay University | Thesis co-supervisor |
Bassam JABAIAN | Avignon University | Guest |
Summary
This thesis falls within the framework of Deep Learning applied to the field of Automatic Speech Understanding. Its main objective is to take advantage of existing data in languages with a high level of semantic annotation of speech in order to develop high-performance comprehension systems in languages with a lower level of annotation. The last few years have seen considerable progress in the field of automatic speech translation, thanks to new approaches that enable audio and textual modalities to converge, the latter being based on vast quantities of data. Combining speech understanding with translation from a natural source language to a conceptual target language, we consider the SAMU-XLSR speech encoder whose semantically enriched encoding is language agnostic. We show the positive impact of this type of encoder in a neural model of end-to-end speech understanding and study in detail its linguistic and semantic encoding capabilities. This study continues by specialising the enrichment of this encoder, with the aim of orienting its encoding towards the semantic domain of the French MEDIA, Italian PortMEDIA and Tunisian TARIC-SLU datasets. A double specialisation is proposed in order to preserve the encoder's ability to generate certain semantic abstractions while limiting the loss of its cross-lingual capabilities during the classic fine-tuning phase of the model on the final task. Our contributions have advanced the state of the art in cross-language and cross-domain portability for the MEDIA, PortMEDIA and TARIC-SLU datasets. The SpeechBrain project has been instrumental in implementing our experiments. We contributed to this open-source project by including a complete recipe for the MEDIA dataset in its official distribution.
Keywords Automatic Speech Understanding, Deep Learning, Multilingualism, Semantic Concept Extraction, Speech Representations, Cross-Lingual Portability
Mis à jour le 6 September 2024