《认知机器人》(英文版) Partially Observable Markov Decision Processes

Partially Observable Markov Decision processes Additional reading Leslie Pack Kaelbling, Michael L Littman and Anthony R. Cassandra, "Planning and cting in Partially Observable Stochastic Domains, "Artificial Intelligence, Vol. 101 1998 J Pineau, G. Gordon &s. Thrun"Point-based value iteration: An anytime algorithm for POMDPs". International Joint Conference on Artificial Intelligence(IJCAI) Acapulco, Mexico. Aug. 2003 Ssues Problem statement If we do not know the current state of the world what should we do to act optimally? Input Model of states actions observations transition and emission functions reward function initial distribution and discount factor Outputs from different algorithms Complete/partial, exact/approximate value function Complete/partial, exact/approximate finite state machine Choices Policy Vs plan Exact VS approximate Representation Discrete vs, continuous states actions observations
Partially Observable Markov Decision Processes Ɣ Additional reading: Ɣ Acting in Partially Observable Stochastic Domains,'' Artificial Intelligence, Vol. 101, 1998. Ɣ Issues Ɣ Problem statement: – Ɣ Inputs: – Ɣ Outputs from different algorithms: – Complete/partial, exact/approximate value function – Complete/partial, exact/approximate finite state machine Ɣ Choices: – Policy vs. plan – Exact vs. approximate – Representation – Discrete vs. continuous states, actions, observations Leslie Pack Kaelbling, Michael L. Littman and Anthony R. Cassandra, ``Planning and J. Pineau, G. Gordon & S. Thrun "Point-based value iteration: An anytime algorithm for POMDPs''. International Joint Conference on Artificial Intelligence (IJCAI). Acapulco, Mexico. Aug. 2003. If we do not know the current state of the world, what should we do to act optimally? Model of states, actions, observations, transition and emission functions, reward function, initial distribution and discount factor

Graphical Models actions Sondik. 1971 Beliefs T(S/ ai, s) Observations O(;/ s hidden states Rewards RI Reliable navigation Conventional trajectories may not be robust to localization error Estimated robot position o True robot position o Goal position o
Graphical Models Sondik, 1971 States S1 Rewards R1 S2 T(sj |ai , si ) Z2 b Beliefs 1 Observations Z1 a Actions 1 O(zj |si ) b2 Hidden Observable Reliable Navigation Ɣ Conventional trajectories may not be robust to localization error Estimated robot position True robot position Goal position

Control models Markov Decision Processes Probabilistic P(x) Control Mode World state o Partially Observable Markov Decision Processes Probabilistic Perception P(x) Control Model World state Navigation as a POmdp Controller chooses actions based on probability distributions Action, observation State space State is hidden from the controller
Ɣ Markov Decision Processes Control Models Probabilistic Perception Model P(x) Control World state World state Probabilistic Perception Model P(x) Control Partially Observable Markov Decision Processes Navigation as a POMDP State Space State is hidden from the controller Action, observation Controller chooses actions based on probability distributions argmax P(x)

Passive Controlled Observable Markov Model MDP Hidden State HMM POMDP Stationary policy: Best action is fixed Non-stationary policy: Best action depends on time States can be discrete, continuous, or hybrid Tradeoffs MDP tractable to solve relatively easy to specify Assumes perfect knowledge of state POMDP treats all sources of uncertainty(action, sensing, environment )in a uniform framework Allows for taking actions that gain information Difficult to specify all the conditional probabilities Hugely intractable to solve optimally POMDP Advantages Models information gathering Computes trade-off between Getting reward Being uncertain Am I here, or over there? 人人
Ɣ Stationary policy: Best action is fixed Ɣ Non-stationary Ɣ States can be , continuous, or hybrid Tradeoffs Ɣ MDP + + – Assumes perfect knowledge of state Ɣ POMDP + in a uniform framework + – – Hugely intractable to solve optimally Hidden State HMM POMDP Markov Model MDP Fully Observable Passive Controlled POMDP Advantages Ɣ Models information gathering Ɣ Computes trade-off between: Ɣ Getting reward Ɣ Being uncertain Am I here, or over there? policy: Best action depends on time discrete Tractable to solve Relatively easy to specify Treats all sources of uncertainty (action, sensing, environment) Allows for taking actions that gain information Difficult to specify all the conditional probabilities

A Simple POmdp p(s) State is hidden S2 POMDP Policies Current belief p(s) Optimal action Belief Space
A Simple POMDP p(s) s1 s2 s3 State is hidden POMDP Policies

POMdP Policies Belief Space Value Function over Belief Space current belief Optimal action Coastal Navigation
POMDP Policies Coastal Navigation

Example pomdp Two states: S,S? Two actions: a,,a2 Two observations: Z1, Z2 a Assume y=.9 Reward: r(a,, S,)=2 Emission p(z s1)=.9 2 probabilities: p(z,,=1 r(a1S2)=2 p(z2)=5 r(a23S2)=2 p(Z2ls2)=5 Transition p(S1S1a1)=3 p(S1S2,a1)=6 probabilities: p(S Is, a,) p(S1S2,a1) p(S1S12a2)= p(S1S2a2)=.8 p(S21a2)=.9 p(S1S2,a2)=,2 Example courtesy of Simmons and veloso, 2002 Exact Value Iteration Define " policy tree For a horizon of length a t. there are ZI possible policy trees a Idea Compute value of tree of length t recursively from trees of length t-1 (b)=maxb·a ∈r
Example POMDP s2 s1 Two states: s1, s2 Two actions: a1, a2 Two observations: z1, z2 Assume Ȗ =.9 Reward: r(a1, s1) = 2 r(a2, s1) = 2 r(a1, s2) = 2 r(a2, s2) = 2 Emission p(z1|s1) = .9 2|s1) = .1 p(z1|s2) = .5 p(z2|s2) = .5 Transition p(s1|s1, a1) = .3 p(s1|s2, a1) = .6 2|s1, a1) = .7 p(s1|s2, a1) = .4 p(s1|s1, a2) = .1 p(s1|s2, a2) = .8 p(s2|s1, a2) = .9 p(s1|s2, a2) = .2 a2 a1 Exact Value Iteration Ɣ Define “policy tree” Ɣ For a horizon of length t, there are possible policy trees Ɣ Idea: Compute value of tree of length t recursively from trees of length t-1 a1 z2 z1 a2 z2 z1 a1 z2 z1 t Z A A | | | || | D D b t( ) max Example courtesy of Simmons and Veloso, 2002 probabilities: p(z probabilities: p(s * V b

Exact Value iteration Compute value of each policy tree over space of beliefs b by taking expectation Given tree p (b)=E(Vn(s)=∑p(s)n( Expectation is linear: value function for policy tree is therefore linear, expressed as an a-vector' Compute v(s)from Bayes filter Vn(s)=maxR(sa)+y∑p(sa,s∑p(z|S)n(s) t-1 a2 p Exact Value Iteration Set V=0=0 V=1: Two policy trees, p,, p2 p P2:(a2 SI a2=[v2(S1),V2(2) [R(S12a1)+…,R(S2,a1) =[R R(S2,a2) [R(S12a1),R(S2,a1) =[R(s2a2),R(S2,a2) b maxb·a c∈r p(S2) p(S2) p(S2)
Exact Value Iteration Ɣ Compute value of each policy tree over space of beliefs b by taking expectation Ɣ Given tree p: Ɣ Expectation is linear: value function for policy tree is therefore linear, expressed as an “Į-vector” Ɣ Compute Vp(s) from Bayes’ filter: ¦ | | ( ) ( ( )) ( ) ( ) S p b p p V b E V s s ¦ ¦ ' | | | | 1 ( ) max ( , ) ( '| , ) ( | ') ( ') s Z S t p a t p V s s z J a1 z2 z1 a2 z2 z1 a1 z2 z1 1 1 t pz V 1 1 t pz V Exact Value Iteration Ɣ Set Vt=0=0 Ɣ Vt=1: Two policy trees, p1, p2 a1 p1: p2: a2 [ ( , ), ( , )] [ ( , ) ..., ( , ) ...] [ ( ), ( )] 1 1 2 1 1 1 2 1 1 1 1 1 2 a a V s V s p p D [ ( , ), ( , )] [ ( , ) ..., ( , ) ...] [ ( ), ( )] 1 2 2 2 1 2 2 2 2 2 1 2 2 a a V s V s p p D p(s2 1 ) p(s2 1 ) p(s2 1 ) 1 1 t Vp 1 2 t Vp t 1 V D D b t( ) max p s V R s a p s a s p z s V R s a R s R s a R s R s a R s R s a R s * V b

Exact Value iteration V=2: 8 possible policy trees pI PI 12 P21 p2 P22 R(s1,a)+y∑p(S|a,S)∑p(=ls)(s s'eS Z R(S2, a,)+r2p(s a,52 2P(213 )a, (s")1 S l12 Exact Value Iteration a1=[3.17,244 d211 1.99,4.62 a12=[3.773,2746]·Q212=[2.791,4.728] Q121=[3.557,2314]·a2=[2.719,4.152] a122=[4.16,262]a22=[3.52,4.26] 3.5 2.5 2 0.5 p(S2)
Exact Value Iteration Ɣ Vt=2: 8 possible policy trees p1 a1 z2 z1 p1 p1 a1 z2 z1 p2 p2 a1 z2 z1 p1 p2 a1 z2 z1 p2 p1 a2 z2 z1 p1 p1 a2 z2 z1 p2 p2 a2 z2 z1 p1 p2 a2 z2 z1 p2 p111 p112 p121 p122 p211 p212 p221 p222 .... ( , ) ( '| , ) ( | ') ( ')] ( , ) ( '| , ) ( | ') ( '), 112 ' | | 2 1 1 2 1 ' | | 1 1 1 1 1 111 » » » ¼ º « « « ¬ ª ¦ ¦ ¦ ¦ D J D J D D s Z S s Z S a s s Exact Value Iteration Ɣ Į111= [3.17, 2.44] Ɣ Į112= [3.773, 2.746] Ɣ Į121= [3.557, 2.314] Ɣ Į122= [4.16, 2.62] Ɣ Į211= [1.99, 4.62] Ɣ Į212= [2.791, 4.728] Ɣ Į221= [2.719, 4.152] Ɣ Į222= [3.52, 4.26] 0 1 2 3 4 5 0 1 p(s2) Į222 Į212 Į122 R s p s a s p z s R s a p s a s p z s 0.5 1.5 2.5 3.5 4.5

Pruning alpha vectors Test all a-vectors for dominance maximize d suchthat:∑bss)2s()+d)rai∈v-{n} and:b(s)≥0 for all se s v2={2791,4.728],[3.52,4.26],[4.16,262} Execution 212 p(s) S1 S Belief space actio
Pruning Alpha Vectors Ɣ Test all Į-vectors for dominance Ɣ Vt=2 = {[2.791, 4.728], [3.52, 4.26], [4.16, 2.62]} s S b(s)V s d p V p d S p S p t t ¦ ¦ maximize ( ) 0 ( ) 1 ~ ( ) | | ~ | | p(s) s1 s2 2 2.5 3 3.5 4 4.5 5 0 1 Execution Current belief Optimal Action Belief Space Į222 Į212 Į122 a1 b s b s b(s)V (s) and : for all and : such that : for all { }
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《认知机器人》(英文版) Reactive Planning in Large State Spaces Through.pdf
- 《认知机器人》(英文版) Fast Solutions to CSp's.pdf
- 《认知机器人》(英文版) Massachusetts Institute of Technology.pdf
- 《认知机器人》(英文版) Distributed constraint Satisfaction problems.pdf
- 《认知机器人》(英文版) LPG: Local search for Planning Graphs.pdf
- 《认知机器人》(英文版) Fast Solutions to CSPs.pdf
- 《认知机器人》(英文版) Planning as Heuristic Forward Search.pdf
- 《认知机器人》(英文版) Using the Forest to See the Trees Context-based Object Recognition.pdf
- 《认知机器人》(英文版) Hybrid Mode Estimation and Gaussian Filtering with Hybrid HMMs.pdf
- 《认知机器人》(英文版) Model-based Programming and Constraint-based HMMs.pdf
- 《认知机器人》(英文版) Optimal csPs and Conflict-directed.pdf
- 《认知机器人》(英文版) Mapping Topics: Topological Maps.pdf
- 《认知机器人》(英文版) Conflict-directed Diagnosis and Probabilistic Mode Estimation.pdf
- 《认知机器人》(英文版) Fault Aware Systems: Model-based Programming and Diagnosis.pdf
- 《认知机器人》(英文版) Foundations of state Estimation.pdf
- 《认知机器人》(英文版) Temporal Planning in Space.pdf
- 《认知机器人》(英文版) Executing Model-based Programs Using.pdf
- 《认知机器人》(英文版) Foundations of State Estimation PartⅡ.pdf
- 《认知机器人》(英文版) Robot Motion Planning and (a little)Computational Geometry.pdf
- 《认知机器人》(英文版) Probabilistic methods for Kinodynamic Path Planning.pdf
- 《认知机器人》(英文版) Vision-based SLAM.pdf
- 《认知机器人》(英文版) SIFT SLAM Vision Details.pdf
- 《认知机器人》(英文版) Information Based Adaptive Robotic Exploration.pdf
- 《认知机器人》(英文版) Temporal Plan Execution: Dynamic Scheduling and Simple Temporal Networks.pdf
- 《认知机器人》(英文版) Partially Observable Markov Decision Processes Part II.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义,英文版)Handout 1:Bode plot rules reminder.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义,英文版)Handout 2:Gain and Phase margins.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义,英文版)Handout 5:Control System Design Principles.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义,英文版)Handout 6:Proportional Compensation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 3:Gain and Phase Margins for unstable.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 4:Root-Locus Review.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)extra.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 7:Lag and PI compensation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 8:Lead compensation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 8:Lead compensation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 12:Plants with right half-plane zeros.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 10:Notch compensation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 10:Notch compensation.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 13:More about plants with right.pdf
- 美国麻省理工大学:《航空系统的估计与控制》教学资源(讲义)Handout 16:Describing functions, More.pdf