[Thesis defence] 19/12/2025 – Ahmed NJIFENJOU: «Open-domain conversational agents based on transformer-based language models: towards multilingualism and personality» (UPR LIA)
Mr Ahmed NJIFENJOU will publicly defend his thesis entitled: «Open-domain conversational agents based on transformer-based language models: towards multilingualism and personality» supervised by Mr Fabrice LEFEVRE, on Friday 19 December 2025.
Date and place
Oral defense scheduled on Friday, 19 December 2025 at 2pm
Location: 339 Chemin des Menajaries, 84911 Avignon
Room: CERI lecture theatre
Discipline
Computer Science
Laboratory
UPR 4128 LIA - Avignon Computing Laboratory
Composition of the jury
| Mr Fabrice LEFEVRE | Avignon University | Thesis supervisor |
| Ms Lina Maria ROJAS-BARAHONA | Orange Innovation | Rapporteur |
| Mr Didier SCHWAB | Grenoble Alpes University | Rapporteur |
| Sophie ROSSET | Paris Saclay University | Examiner |
| Mr David TRAUM | University of Southern California | Examiner |
| Bassam JABAIAN | Avignon University | Thesis co-supervisor |
Summary
Open-domain dialogue systems (ODS) are conversational agents (CAI) designed to interact with humans in a natural and open manner. The proliferation of CAIs such as ChatGPT has transformed user expectations: beyond simple syntactic quality, users now demand agents that can demonstrate contextual understanding and cultural sensitivity, maintain a distinct personality, be factual, and possess other human-like skills. Despite remarkable advances, the development of DDO systems continues to face several major limitations, including a strong linguistic bias in favour of English and Chinese, as well as the «open domain paradox» (ODP), which states that the definition of this concept restricts the actual diversity and openness of existing datasets and the resulting models. This thesis addresses these challenges by exploring multilingual and personality-centred strategies to build controllable and culturally adaptive DDO systems using large language models (LLMs) based on Transformers. The contributions of this research are structured around the following complementary areas. First, we study multilingual portability using Machine Translation (MT)-based approaches, comparing two configurations: «Train on Target,» which involves translating the source data to fine-tune the models in the target language, and «Test on Source,» which applies translation at inference time from the source language models. Our results show that multilingual models such as BLOOM exhibit a certain robustness to translation artefacts, although their performance remains inferior to that of models trained in the source language. Next, to overcome the limitations of TA, we introduce the MOUD dataset, a corpus of synthetic, multilingual, and culturally nuanced dialogues generated using GMLs optimised to follow instructions. MOUD transforms data collection instructions given to humans into structured prompts that force the integration of language-specific knowledge, such as named entities and related elements of popular psychology, thereby enriching linguistic diversity and mitigating ODP. We then explore the modelling of personality and conversational characteristics close to those of humans in dialogue agents. We have developed a structured «role-play prompting» approach to simulate human behaviour in open-domain conversation and proposed a new vector representation of personality based on the OCEAN model: Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism. These approaches allow precise control of dialogue style and personality expression. Empirical evaluations show significant improvements in the perceived humanity of agents, their consistency, and their engagement in discussion, while highlighting a dependence on the emerging capabilities specific to each model. To mitigate this dependence and leverage the MOUD dataset, we explore architectural solutions for cross-lingual transfer. We adapt adapter-based methods to the DDO task as a starting point and propose a new architecture, Sem2Seq-ns, which is a hybrid approach combining input language-independent semantic representations with neural specialisation within the Transformer layers generating the outputs. Although the full results are still pending, we expect performance gains, especially for low-resource languages not seen during fine-tuning.
Keywords : Open domain dialogue, Transformers, Open domain, Language models, Multilingualism, Personality
Updated on 8 December 2025