Arnav Paruthi

Some white text so the formating works out

OpenAI Taxi

This was my first project using reinforcement learning, a type of machine learning which learns through experience. I followed Thomas Simonini's fantastic tutorial to make an agent which learnt to act in OpenAI’s taxi environment.

The solid block is the taxi, and the 4 letters are pickup/dropoff locations. The goal is to pick up passengers from the letter highlighted in blue and drop them off at the letter in red.

The algorithm

Q-Networks

In order to do this I used a reinforcement learning technique called Q-learning. It works using the bellman equation.

The Bellman Equation

The Bellman Equation says that the new expected value of the current state-action pair is equal to the old value of the current state-action pair, plus the sum of the reward gained from the transition to the next state and the difference between the highest state-action pair of the next state, and the old value of the current state. The sum is then multplied by a discount factor.

Ok that was a lot. If you want to learn more about how this all works read my article!

Article

OpenAI's Taxi Game

OpenAI Taxi

Q-Networks