The initial policy is π (A) = 1 and π (B) = 1. That means that action 1 is taken when in state A, and the same action is taken when in state B as well. Calculate the values V π 2 (A) and V π 2 (B) from two iterations of policy evaluation (Bellman equation) after initializing both V π 0 (A) and V π 0 (B) to 0.
+1
Answers (1)
Know the Answer?
Not Sure About the Answer?
Find an answer to your question ✅ “The initial policy is π (A) = 1 and π (B) = 1. That means that action 1 is taken when in state A, and the same action is taken when in ...” in 📘 Mathematics if you're in doubt about the correctness of the answers or there's no answer, then try to use the smart search and find answers to the similar questions.
Home » Mathematics » The initial policy is π (A) = 1 and π (B) = 1. That means that action 1 is taken when in state A, and the same action is taken when in state B as well.