TFM: Reward Machines on RL
Apr 2026
In progress
Master's thesis inspired by 'Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning' (arXiv:2010.03950) and Rodrigo Toro Icarte's repository. Implements tabular Q-Learning where the Q-table is indexed by (RM_state, env_state) instead of just env_state. The Reward Machine is a deterministic finite automaton loaded from .txt files defining transitions with propositional conditions (e.g., 'p' = passenger picked up, 'd' = delivered, with negations '!p'). Supports CRM: when the agent violates a proposition, counterfactual experiences are generated to accelerate learning. Environments: Taxi-v3 (Gymnasium, 500 states), a custom MultiTaxiEnv with 2 passengers and dynamic state space (up to ~10000), and MiniGrid-DoorKey-5x5. Dynamic Q-Table storage (only saves visited pairs). Benchmarks 5 variants comparing convergence speed and mean reward. Includes GIF video recording of learned policies.
AI
Jupyter
Matplot
NumPy
Pygame
Python
Reinforcement Learning







































































































