[Defence of thesis] 26 April 2023 - Paul-Gauthier Noé: "Representation of evidence for privacy: Bayesian inference, compositional evidence and calibration", LIA

Research news 20 April 2023

Date and place

26 April at 2.30pm

Avignon University, Jean-Henri Fabre Campus, CERI, Amphi Ada Lovelace

discipline

Computer Science

Laboratory

Avignon Computer Laboratory (LIA)

Management

Jean-François Bonastre (Director)

Driss Matrouf (Co-supervisor)

Composition of the jury

  • Frédéric Bimbot (rapporteur)
  • Daniel Ramos (rapporteur)
  • Isabel Trancoso (examiner)
  • David Lovell (reviewer)
  • Junichi Yamagishi (reviewer)
  • Corinne Fredouille (examiner)
  • Pierre-Michel Bousquet (examiner)
  • Jean-François Bonastre (thesis supervisor)
  • Driss Matrouf (thesis co-supervisor)

Summary of the thesis

Privacy in multimedia technologies generally involves concealing an individual's identity. This thesis, however, focuses on so-called attribute-oriented privacy. The aim is to conceal information relating to a single attribute of the individual, such as gender, nationality or health status, while preserving the individual's other attributes or characteristics. When the attribute to be concealed can take only one value from a finite set of possible values, an attacker's knowledge of the attribute is represented by a discrete probability distribution over the set of possible values. Bayesian inference describes how a priori knowledge, i.e. before any data has been observed, is transformed into a posteriori knowledge by a likelihood function.

In the binary case, i.e. when the set of possible values for the attribute contains only two elements, the likelihood function can be written as the log-ratio of the two likelihoods (LRV). The LRV is known in Bayesian inference as the weight of evidence and informs which hypothesis (or attribute value) an observation supports and at what point. Bayes' formula can be written as the sum of the LRV and the log-ratio of the a priori probabilities. In this way, the contribution of observation and the contribution of a priori knowledge are separated in the calculation of a posteriori probabilities.

In this thesis, it is proposed that the attribute information revealed by a datum is represented by a likelihood function. In the binary case, the LRV expresses the likelihood function intuitively. However, this way of writing Bayes' formula cannot be directly generalised to cases with more than two possible hypotheses or attribute values. This thesis therefore proposes to treat probability distributions and likelihood functions as compositional data. Bayes' formula can thus be rewritten as a sum of the contribution of the data and the a priori knowledge. The compositional data lives on the simplex on which a Euclidean vector space, known as the Aitchison geometry, can be defined. With the coordinate system defined by the isometric-log-ratio approach, Bayesian inference is the translation of the a priori distribution by the likelihood function. In this space, the likelihood function, called Isometric-Log-Ratio-Likelihood (ILRV), is considered to be the multi-dimensional and multi-hypothesis generalisation of the LRV. The norm of ILRV is the strength of the evidence and measures the distance between the a priori distribution and the a posteriori distribution, which can be seen as a measure of the information revealed by the data.

The notion of perfect secrecy introduced by Claude Shannon can be applied to privacy. Perfect secrecy corresponds to the situation where the attacker's a posteriori distribution is equal to his a priori distribution. In this way, the data has not provided any information to the attacker. Perfect secrecy is achieved when the LRV is zero for binary cases and, by extension, when the ILRV is equal to the zero vector for non-binary cases.

For ILRVs to correctly represent the information revealed by the data, they need to be calibrated. The concept of calibration is usually applied to probabilities but can also be applied to likelihoods. The idempotency of calibrated LRVs and its constraint on the distribution of normally distributed LRVs are well-known properties. In this thesis, these properties are generalised to ILRVs for multi-hypothesis applications.

Based on these properties and the compositional nature of likelihood functions, a new discriminant analysis is proposed. First presented for binary applications, the discriminant analysis plunges the input feature vectors into a space where the discriminant component is a calibrated LRV. The transformation is learned with a normalizing flow, which is a cascade of invertible artificial neural networks.

In this thesis, we propose to use this discriminant analysis for attribute-oriented privacy. As the transformation is invertible, the LRV can be set to zero before plunging the data back into the feature space, thus respecting the idea of perfect secrecy. This approach is being tested for speaker gender concealment on speaker representations derived from deep artificial neural networks. Once protected, these representations are tested on an automatic speaker verification task and on a voice conversion task.

As the properties of the LRV can be generalised to the ILRV thanks to the Aitchison geometry, the discriminant analysis proposed for the binary case can easily be generalised to non-binary cases. In a similar way to the binary case, this approach, which we propose and call Compositional Discriminant Analysis, plunges the data into a space where the discriminant dimensions form a calibrated likelihood function expressed by the ILRV.

The idea of using a normalising stream can be experimented with to learn a LRV calibration transformation. This is briefly discussed at the end of this thesis.

Although the work in this thesis is mainly presented in the context of personal data security, the concepts addressed open up research directions in the fields of probability and likelihood calibration and in machine learning, in particular for learning interpretable representations of information.

Mots clés associés
thesis defence