sarsop wraps the tasks of writing the pomdpx file defining the problem, running the pomdsol (SARSOP) algorithm in C++, and then reading the resulting policy file back into R. The returned alpha vectors and alpha_action information is then transformed into a more generic, user-friendly repesentation as a matrix whose columns correspond to actions and rows to states. This function can thus be used at the heart of most pomdp applications.

sarsop(transition, observation, reward, discount, state_prior = rep(1,
  dim(observation)[[1]])/dim(observation)[[1]], verbose = TRUE,
  log_dir = tempdir(), log_data = NULL, cache = TRUE, ...)

Arguments

transition

Transition matrix, dimension n_s x n_s x n_a

observation

Observation matrix, dimension n_s x n_z x n_a

reward

reward matrix, dimension n_s x n_a

discount

the discount factor

state_prior

initial belief state, optional, defaults to uniform over states

verbose

logical, should the function include a message with pomdp diagnostics (timings, final precision, end condition)

log_dir

pomdpx and policyx files will be saved here, along with a metadata file

log_data

a data.frame of additional columns to include in the log, such as model parameters. A unique id value for each run can be provided as one of the columns, otherwise, a globally unique id will be generated.

cache

should results from the log directory be cached? Default TRUE. Identical functional calls will quickly return previously cached alpha vectors from file rather than re-running.

...

additional arguments to appl.

Value

a matrix of alpha vectors. Column index indicates action associated with the alpha vector, (1:n_actions), rows indicate system state, x. Actions for which no alpha vector was found are included as all -Inf, since such actions are not optimal regardless of belief, and thus have no corresponding alpha vectors in alpha_action list.

Examples

# NOT RUN {
 ## Takes > 5s
## Use example code to generate matrices for pomdp problem:
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
alpha <- sarsop(transition, observation, reward, discount, precision = 10)
compute_policy(alpha, transition, observation, reward)
# }