[Defence of thesis] 2 May 2023 - Sondes Abderrazek: "Speech intelligibility assessment using deep learning: towards greater interpretability in clinical phonetics", LIA
Date and place
2 May at 2pm
Avignon University, Jean-Henri Fabre Campus, CERI, Amphi Ada
Discipline
Computer Science
Laboratory
Avignon computer laboratory
Management
Prof. Corinne Fredouille (thesis supervisor)
Composition of the jury
- Mr Jean HENNEBERT (Rapporteur)
- Mr LOLIVE Damien (Rapporteur)
- MRS TRANCOSO Isabel (Examiner)
- MRS WOISARD Virginie (Examiner)
- Mr LARCHER Anthony (Examiner)
- Mr BONASTRE Jean-François (Examiner)
- MME FREDOUILLE Corinne (Thesis supervisor)
Summary of the thesis
Speech intelligibility is an essential component of effective communication. It can be defined as the degree to which a speaker's message can be understood by a listener. This ability can be impaired by speech disorders, potentially leading to a reduced quality of life for individuals. In the case of head and neck cancer, speech can be affected by the presence of tumours in the speech-producing apparatus. However, the main cause is usually the treatment of the tumour, involving surgery, radiotherapy, chemotherapy or a combination of these treatments. In such cases, speech quality assessment is crucial for evaluating patients' communication deficits and developing targeted treatment plans. In clinical practice, perceptual measures are considered a standard for the assessment of speech disorders. Although these measures are widely used, they have several limitations, the most important of which is their subjectivity. As a result, automatic speech assessment has emerged as a promising alternative to perceptual measures since the 1990s. In this thesis, we explore the potential of deep learning techniques to assess speech disorders while addressing the limitations of existing assessment tools. In this sensitive clinical context where the stakes are high and trust is paramount, we consider the explicability and interpretability of these tools to be a mandatory rather than an optional feature. We propose a three-stage methodology based on deep learning and dedicated to the interpretable assessment of intelligibility in the context of speech disorders. In the first step, we address a major problem in current automatic tools dedicated to the assessment of impaired speech, namely limited knowledge about the relationship between speech impairment and the resulting assessment score. To this end, we are setting up a model based on deep learning, trained on healthy speech and dedicated to an intermediate task of classifying French phonemes. This methodological choice has two purposes. The first is to take advantage of the phoneme-level knowledge provided by the classification task to address the major problem mentioned above. The second relates to the use of healthy (normal) speech. It makes up for the very limited amount of pathological data available, while meeting the high data quantity requirements of deep learning. In the second stage, the major objective is to ensure the development of an interpretable solution, with a view to its acceptance in clinical practice. With this in mind, we are studying the ability of the phoneme classification model to produce relevant knowledge linked to the characteristics of the targeted speech disorders. We thus propose a general and original analytical framework, called Neuro-based Concept Detector - NCD, specially designed to interpret the deep representations of a model. This framework makes it possible to highlight, within the classification model resulting from the first stage, a representation of the acoustic and articulatory characteristics of healthy speech in terms of phonetic features, which can easily be interpreted in terms of alterations in the case of speech disorders. Finally, the third stage is devoted to predicting a final score assessing the intelligibility of an individual's speech. This stage is based on the different levels of representation provided by the previous two stages, enabling the predicted intelligibility score to be related to the degree of speech impairment at phoneme and phonetic feature level. This global methodology thus provides clinicians with an interpretation of the assessment score in the field of phonetics. The promising results obtained on a population of patients suffering from head and neck cancer suggest the potential of such a methodology for monitoring the progress of therapy or developing tailored rehabilitation protocols that would improve the patient's ability to communicate effectively and, consequently, his or her quality of life. The validation of this methodology in clinical practice is one of the many perspectives of this thesis work.
Mis à jour le 25 April 2023