Miquel Gómez Corral

TFM: Reward Machines on RL

Apr 2026

In progress

Master's thesis inspired by 'Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning' (arXiv:2010.03950) and Rodrigo Toro Icarte's repository. Implements tabular Q-Learning where the Q-table is indexed by (RM_state, env_state) instead of just env_state. The Reward Machine is a deterministic finite automaton loaded from .txt files defining transitions with propositional conditions (e.g., 'p' = passenger picked up, 'd' = delivered, with negations '!p'). Supports CRM: when the agent violates a proposition, counterfactual experiences are generated to accelerate learning. Environments: Taxi-v3 (Gymnasium, 500 states), a custom MultiTaxiEnv with 2 passengers and dynamic state space (up to ~10000), and MiniGrid-DoorKey-5x5. Dynamic Q-Table storage (only saves visited pairs). Benchmarks 5 variants comparing convergence speed and mean reward. Includes GIF video recording of learned policies.

Technologies

Jupyter

Matplot

NumPy

Pygame

Python

Reinforcement Learning

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_doorkey.txt_qtable_video.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi.txt_2-p2-normal_qtable_video.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi.txt_2-p2_qtable_video.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi.txt_2_qtable_video.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi.txt_p2-normal_qtable_video.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi.txt_qtable_video.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-0.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-1.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-2-0.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-2-1.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-2-2.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-2-3.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-2-4.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-2.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-3.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-4.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-p-0.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-p-1.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-p-2.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-p-3.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_2p.txt-p-4.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_v2.txt_qtable_video.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/rm_taxi_v2.txt_qtable_video_static.gif

https://raw.githubusercontent.com/MiquelGomezCorral/TFM-Reinforcement-Learning-Reward-Machines/main/videos/taxi_qtable_video.gif