Learning about Reinforcement Learning

It took me more than two months to understand how these 48 lines of code work. It’s a policy-based algorithm, which means that it does not try to predict future rewards and choose actions that would produce the highest rewards. Instead, REINFORCE learns to take an action based on the current state, aiming for an optimal policy. This policy is approximated using a neural network.

The policy learns to balance the pole (starts at 20 sec)

It’s been very fun to learn about neural nets and reinforcement learning algorithms. Code implemented along the way is available at ai-playground.