Research Student at ISAE-SUPAERO

Thibault CLARA

PPO-Clip Deep Reinforcement Learning Controller for MAVION VTOL UAV

About Me

MSc Systems & Control Aerospace Engineering student at ISAE-SUPAERO in Toulouse, France. BSc in Aerospace Engineering at Delft University of Technology (TU Delft) in Delft, the Netherlands with a specialization in Artificial Intelligence for Engineering.
I am currently conducting research on VTOL Tail-Sitter Drone Control with Deep Reinforcement Learning, supervised by Professor Philippe Pastor, as part of IONLAB’s efforts.
I was previously supervised by Professor. Nguyen Anh K Doan, the co-Director of the AI Fluids Lab at TU Delft, on research titled “Prediction of Extreme Events Derived from Latent Space Compression of 2D Kolmogorov Flows”.
Deep Reinforcement Learning (DRL) has shown promising potential for developing controllers capable of handling complex scenarios in the context of Hybrid Aerial Vehicles (HAVs), which combine characteristics of both fixed-wing and rotorcraft dynamics. While DRL offers the possibility of outperforming traditional methods in highly nonlinear systems operating under uncertain or dynamically changing conditions, its performance and robustness in real-world applications remain active areas of research. In this work, the Proximal Policy Optimization (PPO) algorithm—implemented via OpenAI’s Stable-Baselines3 library—is employed to train unified policy controllers for the take-off, cruise, and landing phases, including the transitions between vertical and horizontal flight. A custom simulation and training environment was developed using the MAVION platform, a Vertical Take-Off and Landing (VTOL) Unmanned Aerial Vehicle (UAV) designed at ISAE-SUPAERO. Separate two-dimensional (2D) trajectory controllers were trained for each phase under symmetric flight assumptions, demonstrating accurate tracking of complete flight profiles with minimal error. In addition, generalization techniques were introduced, enabling the trained policies to reliably track unseen target trajectories within a predefined flight envelope. The performance of the trained controllers was also evaluated under light atmospheric turbulence, showing encouraging results and suggesting potential for robust real-world applications.
Keywords: Proximal Policy Optimization (PPO), Unmanned Aerial Vehicle (UAV), Hybrid Aerial Vehicle (HAV), Vertical Take-Off and Landing (VTOL), Deep Reinforcement Learning (DRL), Trajectory Controller
Philippe PASTOR, Researcher & Professor at ISAE-SUPAERO in Flight Dynamics & Aircraft Design, PhD in Automatic Control & AI from ISAE-SUPAERO
I spent 2 months developing an understanding of OpenAI’s Proximal Policy Optimization (PPO) algorithm from the stable-baselines3 library. I know how to set it up and tune it, I can help with similar deployment use-cases in robots or autonomous vehicles.
DRL control algorithm deployed in autonomous UAVs for both civil and defense applications.

Paper

⋆

Paper ⋆

Link

Get in touch.

Institut Supérieur de l'Aéronautique et de l'Espace (ISAE-SUPAERO)

10, Avenue Marc Pélegrin, BP 54032, 31055 Toulouse CEDEX 4 - France
thibault.clara@student.isae-supaero.fr
LinkedIn

Research Student at ISAE-SUPAERO

Thibault CLARA

PPO-Clip Deep Reinforcement Learning Controller for MAVION VTOL UAV

About Me

Paper

⋆

Paper ⋆ Paper ⋆

Get in touch.

Location

E-mail

LinkedIn

PPO-CLIP Deep Reinforcement Learning Controller for MAVION VTOL UAV - ISAE-SUPAERO

Paper ⋆