Asier Serrano personal webpage & blog.

1. Introduction

If we know all the probabilities of applying each possible action to each state, and thus the corresponding transitions, the problem we have to solve is how to search in an already known tree of states and actions the optimal action for each state.

2. Value & policy iteration

See more details here.

3. Useful resources

Model Based Reinforcement Learning: Policy Iteration, Value Iteration, and Dynamic Programming, by Steve Brunton.

Model based RL

Abstract

1. Introduction

2. Value & policy iteration

3. Useful resources