Fundamentals of Reinforcement Learning
A systematic tour of foundational RL, from k-armed bandits to planning via Markov Decision Processes and TD learning

Fundamentals of Reinforcement Learning free download
A systematic tour of foundational RL, from k-armed bandits to planning via Markov Decision Processes and TD learning
Reinforcement learning is one of the most exciting branches of modern artificial intelligence.
It came to the public consciousness largely because of a brilliant early breakthrough of DeepMind: in 2016, they utilised reinforcement learning to smash a benchmark thought to be decades away in artificial intelligence - they beat the world’s greatest human grandmaster in the Chinese game of Go.
This was so exceptional because the game tree for Go is so large - the number of possible moves is 1 with 200 zeros after it (or a “gargoogol”!). Compare this with chess, which has only 10^50 nodes in its tree.
Chess was solved in 1997, when IBM’s Deep Blue beat the world’s best Gary Kasparov. Deep Blue was the ultimate example of the previous generation of AI - Good Old-fashioned AI or “GOFAI”. A team of human grandmasters hard-coded opening strategies, piece and board valuations and end-game databases into a powerful computer which then crunched the numbers in a relatively brute-force way.
DeepMind’s approach was very different. Instead of humans hard-coding heuristics for how to play a good game of Go, they applied reinforcement learning so that their algorithms could - by playing themselves, and winning or losing millions of times - work out good strategies for themselves.
The result was a game playing algorithm unbounded by the limitations of human knowledge. Go grandmasters to this day are studying its unique and creative moves in its series against Lee Sedol.
Since then, DeepMind have shown how reinforcement learning can be practically applied to real life problems. A reinforcement learning agent controlling the cooling system for a Google data centre found strategies no human control engineer had thought of, such as to exploit winter temperatures to save heater use. Another of their agents applied to an experimental fusion reactor similarly found superhuman strategies for controlling the highly complex plasma in the reactor.
So, reinforcement learning promises to help solve some of the grand problems of science and engineering, but it has a whole load of more immediately commercial applications too - from the A/B testing of products and website design, to the implementation of recommender systems to learn how to match up a company’s customers with its products, to algorithmic trading, where the objective is to buy or sell stocks to maximise a profit.
This course will explain the fundamentals of this most exciting branch of AI. You will get to grips with both the theory underpinning the algorithms, and get hands-on practise implementing them yourself in python.
By the end of this course, you will have a fundamental grasp these algorithms.
We’ll focus on “tabular” methods using simple NumPy arrays rather than neural networks, as one often gets the greatest understanding of problems by paring them down to their simplest form and working through each step of an algorithm with pencil and paper.
There is ample opportunity for that in this course, and each section is capped with a coding assignment where you will build the algorithms yourself
From there, the world is your oyster! Go solve driverless cars, make bajillions in a hedge fund, or save humanity by solving fusion power!