Compare historical actions to what pomdp recommendation would have been
hindcast_pomdp(transition, observation, reward, discount, obs, action, state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]], alpha = NULL, ...)
transition | Transition matrix, dimension n_s x n_s x n_a |
---|---|
observation | Observation matrix, dimension n_s x n_z x n_a |
reward | reward matrix, dimension n_s x n_a |
discount | the discount factor |
obs | a given sequence of observations |
action | the corresponding sequence of actions |
state_prior | initial belief state, optional, defaults to uniform over states |
alpha | the matrix of alpha vectors returned by |
... | additional arguments to |
a list, containing: a data frame with columns for time, obs, action, and optimal action, and an array containing the posterior belief distribution at each time t
# NOT RUN { ## Takes > 5s ## Use example code to generate matrices for pomdp problem: source(system.file("examples/fisheries-ex.R", package = "sarsop")) alpha <- sarsop(transition, observation, reward, discount, precision = 10) sim <- hindcast_pomdp(transition, observation, reward, discount, obs = rnorm(21, 15, .1), action = rep(1, 21), alpha = alpha) # }