sarsop wraps the tasks of writing the pomdpx file defining the problem, running the pomdsol (SARSOP) algorithm in C++, and then reading the resulting policy file back into R. The returned alpha vectors and alpha_action information is then transformed into a more generic, user-friendly repesentation as a matrix whose columns correspond to actions and rows to states. This function can thus be used at the heart of most pomdp applications.
sarsop(transition, observation, reward, discount, state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]], verbose = TRUE, log_dir = tempdir(), log_data = NULL, cache = TRUE, ...)
transition | Transition matrix, dimension n_s x n_s x n_a |
---|---|
observation | Observation matrix, dimension n_s x n_z x n_a |
reward | reward matrix, dimension n_s x n_a |
discount | the discount factor |
state_prior | initial belief state, optional, defaults to uniform over states |
verbose | logical, should the function include a message with pomdp diagnostics (timings, final precision, end condition) |
log_dir | pomdpx and policyx files will be saved here, along with a metadata file |
log_data | a data.frame of additional columns to include in the log, such as model parameters. A unique id value for each run can be provided as one of the columns, otherwise, a globally unique id will be generated. |
cache | should results from the log directory be cached? Default TRUE. Identical functional calls will quickly return previously cached alpha vectors from file rather than re-running. |
... | additional arguments to |
a matrix of alpha vectors. Column index indicates action associated with the alpha vector, (1:n_actions), rows indicate system state, x. Actions for which no alpha vector was found are included as all -Inf, since such actions are not optimal regardless of belief, and thus have no corresponding alpha vectors in alpha_action list.
# NOT RUN { ## Takes > 5s ## Use example code to generate matrices for pomdp problem: source(system.file("examples/fisheries-ex.R", package = "sarsop")) alpha <- sarsop(transition, observation, reward, discount, precision = 10) compute_policy(alpha, transition, observation, reward) # }