[PhD defence] 9/09/2024 - Gaëlle Laperrière: "Understanding speech in a multilingual context" (UPR LIA)

Research news 29 August 2024

Gaëlle Laperrière will defend her thesis on 9 September 2024 on the theme: "Understanding speech in a multilingual context"..

Date and place

Oral defense scheduled on Monday 09 September 2024 at 15:00
Venue: 339 Chemin des Meinajaries, Centre d'Enseignement et de Recherche en Informatique, 84911, Avignon
Room: Ada Lovelace Amphitheatre

Discipline

Computer Science

Laboratory

UPR 4128 LIA - Avignon Computing Laboratory

Composition of the jury

Mr Yannick ESTÈVE Avignon University Thesis supervisor
Mr Benoit FAVRE Aix-Marseille University Rapporteur
Mr Alexandre ALLAUZEN University of Paris Dauphine-PSL Rapporteur
Mr Fabrice LEFÈVRE Avignon University Examiner
Mr Marco DINARELLI National Centre for Scientific Research Examiner
Ms Nathalie CAMELIN Le Mans University Examiner
Mr Philippe LANGLAIS University of Montreal Examiner
Ms Sahar GHANNAY Paris-Saclay University Thesis co-supervisor
Bassam JABAIAN Avignon University Guest

Summary

This thesis falls within the framework of Deep Learning applied to the field of Automatic Speech Understanding. Its main objective is to take advantage of existing data in languages with a high level of semantic annotation of speech in order to develop high-performance comprehension systems in languages with a lower level of annotation. The last few years have seen considerable progress in the field of automatic speech translation, thanks to new approaches that enable audio and textual modalities to converge, the latter being based on vast quantities of data. Combining speech understanding with translation from a natural source language to a conceptual target language, we consider the SAMU-XLSR speech encoder whose semantically enriched encoding is language agnostic. We show the positive impact of this type of encoder in a neural model of end-to-end speech understanding and study in detail its linguistic and semantic encoding capabilities. This study continues by specialising the enrichment of this encoder, with the aim of orienting its encoding towards the semantic domain of the French MEDIA, Italian PortMEDIA and Tunisian TARIC-SLU datasets. A double specialisation is proposed in order to preserve the encoder's ability to generate certain semantic abstractions while limiting the loss of its cross-lingual capabilities during the classic fine-tuning phase of the model on the final task. Our contributions have advanced the state of the art in cross-language and cross-domain portability for the MEDIA, PortMEDIA and TARIC-SLU datasets. The SpeechBrain project has been instrumental in implementing our experiments. We contributed to this open-source project by including a complete recipe for the MEDIA dataset in its official distribution.

Keywords  Automatic Speech Understanding, Deep Learning, Multilingualism, Semantic Concept Extraction, Speech Representations, Cross-Lingual Portability

Mots clés associés
thesis defence