Some white text so the formating works out
This was my first project using reinforcement learning, a type of machine learning which learns through experience. I followed Thomas Simonini's fantastic tutorial to make an agent which learnt to act in OpenAI’s taxi environment.
The solid block is the taxi, and the 4 letters are pickup/dropoff locations. The goal is to pick up passengers from the letter highlighted in blue and drop them off at the letter in red.
The algorithm
In order to do this I used a reinforcement learning technique called Q-learning. It works using the bellman equation.
The Bellman Equation
The Bellman Equation says that the new expected value of the current state-action pair is equal to the old value of the current state-action pair,
plus the sum of the reward gained from the transition to the next state and the difference between the highest state-action pair
of the next state, and the old value of the current state. The sum is then multplied by a discount factor.
Ok that was a lot. If you want to learn more about how this all works read my article!