[Dissertation defence] 11/12/23, Anaïs Chanclu: "Recognising people by their voices: Definition of a scientific framework to guarantee the reliability of the results of a voice comparison in the forensic context" (LIA)

Research news 27 November 2023

Title of the thesis

Recognising people by their voices: Defining a scientific framework to ensure the reliability of voice comparison results in forensic contexts

Date and place

11 December 2023, 2.30pm
Avignon University, Hannah Arendt campus, thesis room

Discipline

Computer Science

Laboratory

Avignon Computing Laboratory (LIA)

Framing

  • Jean-François Bonastre

Composition of the jury

  • Martine Adda-Decker
  • Mr Julien Pinquier
  • Ms Christine Meunier
  • Mr Jean-François Bonastre

Summary of the thesis

During a police investigation or criminal trial, voice recordings are sometimes taken for comparison with the voices of suspects. Very often, the recordings taken - known as traces - come from telephone taps, calls to the emergency services or voicemail messages. The recordings of suspects - known as comparison pieces - generally come from the police, in particular by voice recording. Because the traces and comparisons were not made under the same conditions, and because the conditions under which the trace was recorded are often little known or even unknown, the variability between the recordings to be compared is not quantifiable. Many factors come into play, including the audio files to be compared, the linguistic content, the environment and the speaker(s).

Voice-comparison practices have evolved throughout history, but they have never been based on a scientific framework as recommended by the Frye and Daubert standards. As a result, the reliability of voice expertise has been called into question (Trayvon Martin case) and spurious practices (Élodie Kulik case) have led to miscarriages of justice. Today, the Service national de police scientifique (SNPS) and the Institut de recherche criminelle de la Gendarmerie nationale (IRCGN) have established quality protocols to ensure that their expert reports are based on scientific literature. The aim of this thesis is to define a scientific framework in which the reliability of the results of a voice comparison is known. To do this, we are working on three points: the influence of certain factors on the performance of a voice comparison, human perception of a speaker's identity, and voice characterisation.

The first point we address is the influence of certain factors on the performance of a voice comparison. We study these factors individually and then in combination with another factor. The results show that some factors have a greater influence on performance than others. However, variability applies at speaker level. The factors studied do not affect performance in the same way for all speakers.

Secondly, we are studying the human perception of speakers. To do this, we set up a perceptual experiment to group recordings into speakers. To meet the task, we defined a measure of clustering purity. We also compared the results obtained with those of an automatic voice comparison. The results showed a disparity in speaker clustering, linked in particular to the listeners' mother tongue. The automatic approach obtained better results than the listeners.

Finally, we are interested in voice characterisation. We have developed a new system for detecting phonation type, first on pre-pausal vowels and then on all voiced phonemes. This new system uses PASE+ for multiple parameter extraction and a multi-layer perceptron (MLP) for classification. We compared this system with a more conventional system based on the extraction of Mel-Frequency Cepstral Coefficients (MFCC) and a support vector machine (SVM) for classification. The results obtained demonstrate the superiority of the newly created system over the conventional system. Generalization to all voiced phonemes showed that female speakers tended to have a modal voice and male speakers tended to have a non-modal voice.

Overall, this thesis has shown that voice comparison is a complex field and that the results obtained can be influenced by many factors. The desire to standardise voice comparison practices requires in-depth knowledge of these factors and their interrelationships. However, in this thesis, only a handful of factors were studied. It is therefore necessary to continue research in this direction in order to standardise voice comparison practices and guarantee reliable results.

Mots clés associés
thesis defence