[PhD defence] 15/09/2025 - Yanis Labrak: "Language Models at the Crossroads of Text and Speech for Health Applications" (UPR LIA)
Yanis LABRAK will publicly defend his thesis entitled: "Language Models at the Crossroads of Text and Speech for Health Applications", directed by Mickaël ROUVIER on Monday 15 September 2025.
Date and place
Oral defense scheduled on Monday 15 September 2025 at 2pm
Venue: CERI, 339 Chem. des Meinajaries, 84000 Avignon, France
Room: Amphithéâtre Blaise
Discipline
Computer Science
Laboratory
UPR 4128 LIA - Avignon Computing Laboratory
Composition of the jury
Ms Asma BEN ABACHA | Microsoft Health AI | Examiner |
Mrs Elena V. EPURE | Deezer Research | Examiner |
Mr Laurent BESACIER | Naver Labs Europe | Examiner |
Mickaël ROUVIER | LIA, Avignon University | Thesis co-director |
Richard DUFOUR | LS2N, Nantes University | Thesis co-director |
Mr Pierre ZWEIGENBAUM | LISN, Paris-Saclay University | Rapporteur |
Mr Philippe LANGLAIS | DIRO, University of Montreal | Rapporteur |
Mr Julien NAVE | Zenidoc | Guest |
Summary
The medical field presents unique challenges for language processing, with its specialised terminology, strict data regulations and critical information needs. With the democratisation of language models to assist healthcare professionals in their daily work, their adaptation to application domains has become necessary to facilitate their accessibility to a wider audience, in different languages and domains, while reducing the computational cost of their use.
On the other hand, traditional approaches to medical speech processing rely on cascading systems that convert speech to text, apply natural language processing (NLP), and sometimes regenerate speech. Although practical, these systems often lose paralinguistic features essential for clinical communication and suffer from the propagation of errors between processing stages. Recent advances in the quantification of self-supervised speech representations have created new possibilities for integrating speech representation into other systems without intermediate conversion to text, potentially preserving more communicative nuance.
In this thesis, I examine, among other things, how speech capabilities can be integrated with pre-trained text-based language models with health-related knowledge, exploiting their acquired medical knowledge while enabling direct speech processing without intermediate steps. Analysis of the alignment capabilities between speech and text representations at different levels of abstraction have revealed more optimal methods for efficient cross-modal knowledge transfer, thereby promoting learning constrained by a limited amount of training data, a crucial consideration given the data constraints in the healthcare domain.
Keywords Speech Processing, Domain Adaptation, Intermodal Transfer, Health Domain Adaptation, Language Models, Large Language Model (GLM)
Updated le 4 September 2025