compute_policy

compute_policy

compute_policy(alpha, transition, observation, reward,
  state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
  a_0 = 1)

Arguments

alpha	the matrix of alpha vectors returned by `sarsop`
transition	Transition matrix, dimension n_s x n_s x n_a
observation	Observation matrix, dimension n_s x n_z x n_a
reward	reward matrix, dimension n_s x n_a
state_prior	initial belief state, optional, defaults to uniform over states
a_0	previous action. Belief in state depends not only on observation, but on prior belief of the state and subsequent action that had been taken.

Value

a data frame providing the optimal policy (choice of action) and corresponding value of the action for each possible belief state

Examples

# NOT RUN {
 ## Takes > 5s
## Use example code to generate matrices for pomdp problem:
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
alpha <- sarsop(transition, observation, reward, discount, precision = 10)
compute_policy(alpha, transition, observation, reward)
# }

Arguments

Value

Examples

Contents