sim_pomdp

sim_pomdp

sim_pomdp(transition, observation, reward, discount, state_prior = rep(1,
  dim(observation)[[1]])/dim(observation)[[1]], x0, a0 = 1, Tmax = 20,
  policy = NULL, alpha = NULL, reps = 1, ...)

Arguments

transition	Transition matrix, dimension n_s x n_s x n_a
observation	Observation matrix, dimension n_s x n_z x n_a
reward	reward matrix, dimension n_s x n_a
discount	the discount factor
state_prior	initial belief state, optional, defaults to uniform over states
x0	initial state
a0	initial action (default is action 1, e.g. can be arbitrary if the observation process is independent of the action taken)
Tmax	duration of simulation
policy	Simulate using a pre-computed policy (e.g. MDP policy) instead of POMDP
alpha	the matrix of alpha vectors returned by `sarsop`
reps	number of replicate simulations to compute
...	additional arguments to mclapply

Value

a data frame with columns for time, state, obs, action, and (discounted) value.

Details

simulation assumes the following order of updating: For system in state[t] at time t, an observation of the system obs[t] is made, and then action[t] is based on that observation and the given policy, returning (discounted) reward[t].

Examples

# NOT RUN {
 ## Takes > 5s
## Use example code to generate matrices for pomdp problem:
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
alpha <- sarsop(transition, observation, reward, discount, precision = 10)
sim <- sim_pomdp(transition, observation, reward, discount,
                 x0 = 5, Tmax = 20, alpha = alpha)

 ## compare to a simple constant harvest policy, with 4 replicates:
 sim <- sim_pomdp(transition, observation, reward, discount,
                 x0 = 5, Tmax = 20, policy = rep(2, length(states)),
                 reps = 4)

# }

Arguments

Value

Details

Examples

Contents