[PhD defence] 26/11/2025 - Andrea Fox : "Reinforcement learning for resource allocation in Edge/Fog computing systems" (UPR LIA)

News Research news 13 November 2025

Mr Andrea FOX will publicly defend his thesis entitled: "Apprentissage par renforcement pour l'affectation des ressources dans les systèmes Edge/Fog computing", supervised by Mr Francesco DE PELLEGRINI and Mr Eitan ALTMAN, on Wednesday 26 November 2025 at 13:00.

Date and place

Defense scheduled for Wednesday, November 26, 2025 at 1:00 p.m.
Venue: Avignon University, city centre site, Campus Hannah Arendt
Thesis room

Discipline

Computer Science

Laboratory

UPR 4128 LIA - Avignon Computing Laboratory

Composition of the jury

Mr Francesco DE PELLEGRINI	Avignon University	Thesis supervisor
Ms Rosa FIGUEIREDO	Avignon University	Examiner
Mr Nahum SHIMKIN	Technion	Examiner
Mr Eitan ALTMAN	INRIA	Thesis co-director
Mr Bruno GAUJAL	INRIA	Examiner
György DÁN	KTH	Rapporteur
Mr Marcello RESTELLI	Politecnico di Milano	Rapporteur

Summary

This thesis studies how reinforcement learning (RL) can be applied to the design of intelligent decision systems for resource management in the context of edge computing. It focuses on the development of algorithms adapted to distributed, heterogeneous and resource-limited environments. In the edge computing paradigm, processing and decision making are brought closer to the end devices, reducing latency and network load, but introducing significant computing, energy and communication constraints. Unlike the cloud, these edge nodes make centralised optimisation impractical. Efficient orchestration therefore requires adaptive, decentralised control methods capable of meeting performance and safety constraints in uncertain environments. RL fits naturally into this framework: by learning through interaction, agents adapt to dynamic conditions and optimise long-term objectives even in the absence of a complete model of the system. The thesis is structured in three parts. The second and third parts apply these ideas to two classical resource management problems - task offloading and load balancing - in the constrained context of edge computing. The first part reviews the basics of RL, in particular Markov decision processes, constrained RL and multi-agent formulations. It introduces the C3-IPPO (Communication-free Constrained Coordination with Independent PPO) framework, a three timescale Lagrangian algorithm where each agent learns a constrained local policy while adjusting an internal parameter. Coordination emerges implicitly, without explicit communication or reward sharing. Experiments on the Melting Pot benchmark show that C3-IPPO achieves cooperative behaviour comparable to communication-based methods, while remaining fully decentralised and scalable. The second part applies RL to task offloading in mobile edge computing. Devices must decide whether to process locally or send their data to an edge server, balancing latency, energy consumption and information freshness (Age of Information, AoI). An initial study formulates the problem for a single device and introduces Ordered Q-Learning, a lightweight algorithm exploiting the monotonic structure of the optimal policy, which speeds up learning while preserving the convergence guarantees of Q-learning. The framework is then extended to several devices sharing the same server, using a decentralised constrained RL scheme inspired by C3-IPPO. The third part deals with load balancing under strong system constraints, distinguishing between (i) the capacity limits of communication links and (ii) those of servers. In the first case, safe policies derived from queueing theory are adapted to edge computing and remain stable even under heavy load. In the second case, the problem is formulated as a constrained RL task, giving rise to DRCPO (Decomposed Reward Constrained Policy Optimization), an algorithm that learns admission and balancing decisions while guaranteeing that security constraints are satisfied. Overall, the thesis shows that combining RL with structural information and constraint mechanisms makes it possible to design adaptive, reliable and decentralised solutions. It also highlights the complementarity between learning approaches and classical analytical methods for efficient resource management in modern distributed systems.

Keywords Edge/fog computing, Reinforcement Learning, Markov decision process with constraints

Updated on 13 November 2025