compute_policy
compute_policy(alpha, transition, observation, reward, state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]], a_0 = 1)
alpha | the matrix of alpha vectors returned by |
---|---|
transition | Transition matrix, dimension n_s x n_s x n_a |
observation | Observation matrix, dimension n_s x n_z x n_a |
reward | reward matrix, dimension n_s x n_a |
state_prior | initial belief state, optional, defaults to uniform over states |
a_0 | previous action. Belief in state depends not only on observation, but on prior belief of the state and subsequent action that had been taken. |
a data frame providing the optimal policy (choice of action) and corresponding value of the action for each possible belief state
# NOT RUN { ## Takes > 5s ## Use example code to generate matrices for pomdp problem: source(system.file("examples/fisheries-ex.R", package = "sarsop")) alpha <- sarsop(transition, observation, reward, discount, precision = 10) compute_policy(alpha, transition, observation, reward) # }