[Thesis defense] 26 April 2023 - Paul-Gauthier Noé: "Representation of evidence for privacy: Bayesian inference, compositional evidence and calibration", LIA

Research news 20 April 2023

Date and place

26 April at 2.30pm

Avignon University, Jean-Henri Fabre Campus, CERI, Amphi Ada Lovelace

discipline

Computer Science

Laboratory

Avignon Computer Laboratory (LIA)

Framing

Jean-François Bonastre (Director)

Driss Matrouf (Co-supervisor)

Composition of the jury

  • Frédéric Bimbot (rapporteur)
  • Daniel Ramos (rapporteur)
  • Isabel Trancoso (reviewer)
  • David Lovell (reviewer)
  • Junichi Yamagishi (reviewer)
  • Corinne Fredouille (reviewer)
  • Pierre-Michel Bousquet (reviewer)
  • Jean-François Bonastre (thesis director)
  • Driss Matrouf (thesis co-supervisor)

Summary of the thesis

Privacy in multimedia technologies is usually about hiding an individual's identity. This thesis, however, focuses on so-called attribute-oriented privacy. The aim is to conceal information about a single attribute of the individual such as gender, nationality or health status, while preserving the other attributes or characteristics of the individual. When the attribute to be concealed can only take one value from a finite set of possible values, an attacker's knowledge of the attribute is represented by a discrete probability distribution over the set of possible values. Bayesian inference describes how a priori knowledge, i.e. before having observed any data, is transformed into a posteriori knowledge by a likelihood function.

In the binary case, i.e. when the set of possible values for the attribute contains only two elements, the likelihood function can be written as the log-ratio of the two likelihoods (LRV). The LRV is known in Bayesian inference as the weight of evidence and informs which hypothesis (or attribute value) an observation supports and to what extent. Bayes' formula can be written as the sum of the LRV and the log-ratio of a priori probabilities. In this way, the contribution of the observation and the contribution of the a priori knowledge are separated in the calculation of the a posteriori probabilities.

In this thesis, it is proposed that the attribute information revealed by a datum is represented by a likelihood function. In the binary case, the LRV expresses the likelihood function intuitively. However, this way of writing Bayes' formula is not directly generalizable to cases with more than two possible hypotheses, or attribute values. This thesis therefore proposes to treat probability distributions and likelihood functions as compositional data. Bayes' formula can thus be rewritten as a sum between the contribution of the data and the a priori knowledge. The compositional data lives on the simplex on which a Euclidean vector space, known as Aitchison geometry, can be defined. With the coordinate system defined by the isometric-log-ratio approach, Bayesian inference is the translation of the a priori distribution by the likelihood function. In this space, the likelihood function, called Isometric-Log-Ratio-Likelihood (ILRV), is considered as the multidimensional and multi-hypothesis generalization of the LRV. The norm of the ILRV is the strength of the evidence and measures the distance between the a priori distribution and the a posteriori distribution which can be seen as a measure of the information revealed by the data.

The notion of perfect secrecy introduced by Claude Shannon can be applied to privacy. Perfect secrecy corresponds to the situation where the attacker's a posteriori distribution is equal to his a priori distribution. In this way, the data has not provided any information to the attacker. Perfect secrecy is achieved when the LRV is zero for binary cases and, by extension, when the ILRV is equal to the zero vector for non-binary cases.

For ILRVs to correctly represent the information revealed by the data, they must be calibrated. The concept of calibration is usually applied to probabilities but can be applied to likelihoods. The idempotency of calibrated LRVs and its constraint on the distribution of normally distributed LRVs are well known properties. In this thesis, these properties are generalized to ILRVs for multi-hypothesis applications.

Based on these properties and the compositional nature of likelihood functions, a new discriminant analysis is proposed. First presented for binary applications, the discriminant analysis dives the input feature vectors into a space where the discriminant component is a calibrated LRV. The transformation is learned with a normalizing flow which is a cascade of invertible artificial neural networks.

In this thesis, we propose to use this discriminant analysis for attribute-oriented privacy. Since the transformation is invertible, the LRV can be set to zero, before plunging the data back into the feature space, thus respecting the idea of perfect secrecy. This approach is tested on speaker gender hiding on speaker representations derived from deep artificial neural networks. Once protected, these representations are tested on an automatic speaker verification task and on a voice conversion task.

As the properties of the LRV can be generalised to the ILRV thanks to the Aitchison geometry, the discriminant analysis proposed in the binary case can easily be generalised to the non-binary case. In a similar way to the binary case, this approach, which we propose and call Compositional Discriminant Analysis, immerses the data in a space where the discriminant dimensions form a calibrated likelihood function expressed by the ILRV.

The idea of using a normalising stream can be experimented with to learn a LRV calibration transformation. This is briefly discussed at the end of this thesis.

Although the work in this thesis is mainly presented in the context of personal data security, the concepts discussed open up research directions in the fields of probability and likelihood calibration and in machine learning, in particular for learning interpretable representations of information.

Mots clés associés
thesis defence