Coursify | AI-Generated Courses

Create New Course Gallery Contact

Artificial Intelligence

Unit 1

Reinforcement Learning

Introduction to Reinforcement Learning Value Iteration and Policy Iteration Deep Reinforcement Learning with TensorFlow

Unit 2

Large Language Models

Introduction to Large Language Models Transformers and BERT for NLP Applications of Large Language Models

Unit 1 • Chapter 2

Value Iteration and Policy Iteration

Video Summary

**Summary** Value iteration is an iterative algorithm for finding the optimal policy in an MDP. It starts with an arbitrary policy and iteratively improves it by evaluating the policy and then improving it. Policy evaluation computes the value function for a given policy, and policy improvement finds a new policy that is greedy with respect to the value function. Policy iteration converges to the optimal policy if the policy evaluation step converges. Policy iteration can be sped up by using a single iteration of policy evaluation in each iteration. This reduces to value iteration if the policy evaluation step is perfect. Hybrid methods combine value iteration and policy iteration in different ways to achieve different trade-offs between speed and accuracy.

Knowledge Check

Which algorithm does not converge to the optimal value function if you only run one iteration of policy evaluation?

value iteration

policy iteration

both

none of the above

Which of the following is not an advantage of policy iteration over value iteration?

can be faster

more efficient

can handle continuous action spaces

converges to optimal policy

Which of the following is not a step in policy iteration?

policy evaluation

policy improvement

policy iteration

value iteration

Previous

Introduction to Reinforcement Learning

Next

Deep Reinforcement Learning with TensorFlow