[Thesis defense] May 2, 2023 - Sondes Abderrazek: "Speech intelligibility assessment by deep learning: towards more interpretability in clinical phonetics", LIA

Research news 25 April 2023

Date and place

2 May at 2pm
Avignon University, Jean-Henri Fabre Campus, CERI, Amphi Ada


Computer Science


Avignon computer laboratory


Prof. Corinne Fredouille (thesis director)

Composition of the jury

  • Mr HENNEBERT Jean (Rapporteur)
  • Mr. Damien LOLIVE (Rapporteur)
  • Ms TRANCOSO Isabel (Examiner)
  • MRS WOISARD Virginie (Examiner)
  • Mr. LARCHER Anthony (Examiner)
  • Mr. BONASTRE Jean-François (Examiner)
  • MME FREDOUILLE Corinne (Thesis director)

Summary of the thesis

Speech intelligibility is an essential component of effective communication. It can be defined as the degree to which a speaker's message can be understood by a listener. This ability can be impaired by speech disorders, potentially leading to a reduced quality of life for individuals. In the case of head and neck cancer, speech can be affected by the presence of tumours in the speech producing apparatus. However, the main cause is usually the treatment of the tumour, involving surgery, radiotherapy, chemotherapy or a combination of these treatments. In such cases, speech quality assessment is crucial to evaluate the patients' communication deficit and develop targeted treatment plans. In clinical practice, perceptual measures are considered a standard for the assessment of speech disorders. Although these measures are widely used, they have several limitations, the most important being their subjectivity. Therefore, automatic assessment of speech disorders has proved to be a promising alternative to perceptual measures since the 1990s. In this thesis, we explore the potential of deep learning techniques to assess speech disorders while addressing the limitations of existing assessment tools. In this sensitive clinical context where the stakes are high and trust is paramount, we consider the explicability and interpretability of these tools as a mandatory rather than an optional feature. We propose a three-stage methodology based on deep learning and dedicated to the interpretable assessment of intelligibility in the context of speech disorders. In the first step, we address a major problem in current automatic tools dedicated to the assessment of impaired speech, namely a limited knowledge about the relationship between speech impairment and the resulting assessment score. To this end, we implement a deep learning-based model, trained on healthy speech and dedicated to an intermediate task of French phoneme classification. This methodological choice has two purposes. The first is to take advantage of the phoneme-level knowledge provided by the classification task to address the major problem mentioned above. The second is related to the use of healthy (normal) speech. It allows to overcome the very limited amount of pathological data available, while meeting the high data quantity requirements of deep learning. In the second step, the main objective is to ensure the development of an interpretable solution for acceptance in clinical practice. To this end, we investigate the ability of the phoneme classification model to generate relevant knowledge related to the characteristics of the targeted speech disorders. We thus propose a general and original analytical framework, called Neuro-based Concept Detector - NCD, specially designed to interpret the deep representations of a model. This framework allows to highlight within the classification model resulting from the first step a representation of the acoustic and articulatory characteristics of healthy speech in terms of phonetic features, easily interpretable in terms of alterations in the case of speech disorders. Finally, the third step is devoted to the prediction of a final score evaluating the intelligibility of an individual's speech. This step is based on the different levels of representation provided by the two previous steps, allowing the predicted intelligibility score to be related to the degree of speech impairment at the phoneme and phonetic feature level. This global methodology thus provides an interpretation of the assessment score in the field of phonetics for clinicians. The promising results obtained on a population of patients with head and neck cancer suggest the potential of such a methodology to monitor the progress of therapy or to develop tailored rehabilitation protocols that would improve the patient's ability to communicate effectively and, consequently, their quality of life. The validation of this methodology in clinical practice is one of the many perspectives of this thesis work.

Mots clés associés
thesis defence