《认知机器人》(英文版) Partially Observable Markov Decision Processes Part II

Partially Observable Markov Decision processes part ll Additional reading Anthony R. Cassandra. Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. Ph. D. Thesis. Brown University, Department of Computer Science, Providence, RL, 1998 CThis lecture most focuses on chapter 6) Ssues Problem statement If we do not know the current state of the world what should we do to act optimally? Input Model of states actions observations transition and emission functions reward function initial distribution and discount factor Outputs from approximate algorithms approximate value function Outputs from heuristic algorithms: policies that capture the kind of behaviour we want
Partially Observable Markov Decision Processes Part II ● Additional reading: ● Anthony R. Cassandra. Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. Ph.D. Thesis. Brown University, Department of Computer Science, Providence, RI, 1998 (This lecture most focuses on chapter 6). Issues ● Problem statement: – If we do not know the current state of the world, what should we do to act optimally? ● Inputs: – Model of states, actions, observations, transition and emission functions, reward function, initial distribution and discount factor ● Outputs from approximate algorithms: – approximate value function ● Outputs from heuristic algorithms: – policies that capture the kind of behaviour we want

Point-based Value Iteration(Pineau et al., 2003) Fix a set of beliefs° Maintain a single a-vector per belief Ur pdate each a-vector according to bayes'filter a,.(s)=R(s, a)+rP(s'la, s)P(=ls")a arg max(a ar1= arg max(a{n·b,) Maximum-Likelihood State Heuristic Assume the MDP policy Nourbakhsh, Powers and Birchfield,1995 MDP Given belief b I MS (6)=TMDP arg max(b(s) S Voting heuristic Assume the mdP policy Given belief b aa Av(b)=arg max b(s)I(MoP(s), a
Point-based Value Iteration (Pineau et al., 2003) ● Fix a set of beliefs ● Maintain a single α-vector per belief ● Update each α-vector according to Bayes’ filter 2 2.5 3 3.5 4 4.5 5 0 1 b1 b2 b3 argmax( ) argmax( ) ( ) ( , ) ( '| , ) ( | ') , , , , ' , , 1 , , , i t t a i z Z i t a z i t a i s S t i t a z i b b s R s a p s a s p z s t a t a z i = ⋅ = ⋅ = + ∑ ∑ ∈ ∈ − α α α α α γ α α α Maximum-Likelihood State Heuristic ● Assume the MDP policy ● Given belief b: π MDP : S → A ( )⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = ∈ (b) argmax b(s) s S π MLS π MDP Voting Heuristic ● Assume the MDP policy ● Given belief b: Nourbakhsh, Powers and Birchfield, 1995 Simmons & Koenig, 1996 ∑ ( ) ∈ ∈ = ⎩ ⎨ ⎧ ≠ = = s S MDP a A AV i j i j i j b b s I s a a a a a I a a ( ) argmax ( ) ( ), 0 : 1: ( , ) π π

Q-MDP Cassandra, Kaelbling and Littman, 1994 Assume the optimal mdp Q-function Qop:S×A>咒 Given belief b Zo-MDp(b)=arg max 2b(s)@(s, a a∈A Entropy Heuristic Given belief b arg min E_ LH(p(s|a, z)) if H(b)>K a∈ T MDPl arg max(P(s) otherwise Alternate Dual-Mode heuristic Given belief b argmax H(b∑p(s)Qc( S,a)+ A -H()∑p(S)Qm1(a) Discretization Discretize belief space Convert to belief space MDP Solve using conventional DP Subject to "Curse of Dimensionality
Q-MDP ● Assume the optimal MDP Q-function: ● Given belief b: ● Given belief b: Cassandra, Kaelbling, and Littman, 1994 QMDP : S × A → ℜ ∑ ∈ ∈ − = A s S a π Q MDP (b) argmax b(s)Q(s,a) Entropy Heuristic Cassandra, Kaelbling & Kurien, 1996 ( ) [ ] ⎪ ⎩ ⎪ ⎨ ⎧ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ > = ∈ otherwise if argmax( ( )) argmin ( ( | , )) ( ) p s E H p s a z H b b s MDP z a A π κ π Alternate Dual-Mode Heuristic ● Given belief b ( ) ( ) ⎥ ⎦ ⎤ − ⎢ ⎣ ⎡ = + ∑ ∑ ∈ ∈ ∈ s S MDP s S CU a A H b p s Q s a b H b p s Q s a 1 ( ) ( ) ( , ) argmax ( ) ( ) ( , ) κ κ π Discretization ● Discretize belief space ● Convert to belief space MDP ● Solve using conventional DP ● Subject to “Curse of Dimensionality

Coastal Navigation Represent beliefs using b=arg maxb(s); H(b) 1. Discretize state-entropy space 2. Compute reward function and transition function 3. Solve belief state mdp Model parameters · Reward function P(s Back-project to high dimensional belief S S3 R(b Compute expected reward from belief R(b)=E(R(S)=P(SR(S)
Coastal Navigation ● Represent beliefs using 1. Discretize state-entropy space 2. Compute reward function and transition function 3. Solve belief state MDP argmax ( ); ( ) ~ b b s H b s = Model Parameters • Reward function R(b) s1 s2 s3 p(s) Back-project to high dimensional belief = = ∑ S b R(b) E (R(s)) p(s)R(s) ~ Compute expected reward from belief: ~

Model parameters · Use forward model Full dimension P, s P Deterministic process Stochastic process Model parameters Use forward model b∝P(=SbS if 6(s)=b(sa, z) =0 otherwise
Model Parameters • Use forward model bi bj bk a z1 pi (s) pj (s) bi bj a, z1 Low dimension Full dimension Deterministic process Stochastic process ~ ~ ~ ~ ~ Model Parameters • Use forward model bi bq bk a z2 bi bj bk a z1 T(bi , a, bj ) ∝ p(z|s)bi (s|a) if bj (s) = bi (s|a,z) = 0 otherwise ~ ~ ~ ~ ~ ~

What You Should Know Advantages and disadvantages to using POMDPS What an optimal policy looks like How to compute a policy Different heuristics and approximations now they work and where they may fail
What You Should Know ● Advantages and disadvantages to using POMDPs ● What an optimal policy looks like ● How to compute a policy ● Different heuristics and approximations: how they work and where they may fail
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《认知机器人》(英文版) Temporal Plan Execution: Dynamic Scheduling and Simple Temporal Networks.pdf
- 《认知机器人》(英文版) Information Based Adaptive Robotic Exploration.pdf
- 《认知机器人》(英文版) SIFT SLAM Vision Details.pdf
- 《认知机器人》(英文版) Vision-based SLAM.pdf
- 《认知机器人》(英文版) Partially Observable Markov Decision Processes.pdf
- 《认知机器人》(英文版) Reactive Planning in Large State Spaces Through.pdf
- 《认知机器人》(英文版) Fast Solutions to CSp's.pdf
- 《认知机器人》(英文版) Massachusetts Institute of Technology.pdf
- 《认知机器人》(英文版) Distributed constraint Satisfaction problems.pdf
- 《认知机器人》(英文版) LPG: Local search for Planning Graphs.pdf
- 《认知机器人》(英文版) Fast Solutions to CSPs.pdf
- 《认知机器人》(英文版) Planning as Heuristic Forward Search.pdf
- 《认知机器人》(英文版) Using the Forest to See the Trees Context-based Object Recognition.pdf
- 《认知机器人》(英文版) Hybrid Mode Estimation and Gaussian Filtering with Hybrid HMMs.pdf
- 《认知机器人》(英文版) Model-based Programming and Constraint-based HMMs.pdf
- 《认知机器人》(英文版) Optimal csPs and Conflict-directed.pdf
- 《认知机器人》(英文版) Mapping Topics: Topological Maps.pdf
- 《认知机器人》(英文版) Conflict-directed Diagnosis and Probabilistic Mode Estimation.pdf
- 《认知机器人》(英文版) Fault Aware Systems: Model-based Programming and Diagnosis.pdf
- 《认知机器人》(英文版) Foundations of state Estimation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义,英文版)Handout 1:Bode plot rules reminder.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义,英文版)Handout 2:Gain and Phase margins.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义,英文版)Handout 5:Control System Design Principles.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义,英文版)Handout 6:Proportional Compensation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 3:Gain and Phase Margins for unstable.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 4:Root-Locus Review.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)extra.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 7:Lag and PI compensation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 8:Lead compensation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 8:Lead compensation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 12:Plants with right half-plane zeros.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 10:Notch compensation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 10:Notch compensation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 13:More about plants with right.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 16:Describing functions, More.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 18:Dual-input describing functions.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 15:Describing functions.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 19:More about Dual-input describing.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 14:Introduction to nonlinear systesm.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 20:Phase Plane analysis:.pdf