Unit 8. Planning Video 8.1. Introduction AI -- finding appropriate actions for an agent, so planning is the core A* search can find a path given a state space description but only works when environment is determinsistic and fully observable Video 8.2. Problem Solving vs Planning In the real world, finding a path and then executing it without sensors is "not pretty". People can't keep on a straight path without feedback... Example of hikers with limited vision: on cloudy day they walk in circles on sunny day they can see shadows and keep relatively straight Need feed back from environment, need to interleave Planning and Executing. Video 8.3. Planning vs Execution Environment is Stochastic -- we don't know exactly what an action will do ...need to deal with contingincies... like traffic lights Multi-Agent environments, need to react to other agents at execution time Partial Observability -- even with a good plan, sometimes some necessary information is not available until later in the plan. Or may not know the exact starting state. Unknown world model -- inaccurate maps, etc Heirarchical plans -- plan may be high level but involve lots of lower level actions to execute in detail Most of the problems can be addressed by changing from "world view" to "belief state" point of view Video 8.4. Vacuum Cleaner Example see image: 04week_HW_Q4_7.jpg vacuum environment states 8 states, vacuum, dirt, locations actions: move right, move left, suck In Deterministic and Observable environment, planning is simple... say start in left,dirt,dirt state -- suck, move right, suck, and done... If robot sensors break, can't perceive its location or dirty state: now have Un-observable world -- could be in any one of the eight states so belief is ANY state. Can then execute actions and explore states. Video 8.5. Sensorless Vacumm Cleaner Problem see image: 04week_HW_Q3.jpg vacuum belief states If we execute actions we can gain knowledge about the world even w/o sensing. Plans that reach goal without sensing are called "Conformant Plans" Start with top eight-state belief Move right, then we know we are in right-hand location -- right four-state -- either moved from left to right, or bumped against the wall and stayed. Note that in belief state world, right and left moves are not exact inverses because going left from the right-most belief state goes to the left-most belief state set. Can suck dirt in Right, then move Left, then Suck, and end in known state Left, both clean. Quiz: How do I get from the state where I know my current square is clean but nothing else to the belief state where I know I'm in the Right-side location and I know that that location is clean? Start in four-state in middle (unknown location, but clean where V is) (note: do NOT click the start state... the boxes are ACTIONS!!) <****!!! Move Right Suck Video 8.6. Partially Observable Vacuum Cleaner Example Local Sensing -- Vacuum can see what location it's in and dirt in current location, but not dirt in other locations. Start in Left, have two possible world states Perform Right-Move Action, still have two possible world states Get a Percept -- Act -> Percept cycle -- the observation splits world into two different states What does the Act-Observe cycle do to the sizes of the belief state? In deterministic world each state-action moves to exactly one other, but with belief states actions may keep the size the same or decrease it, with observations we can then partition the belief state into smaller bits (note: observation doesn't introduce new possible states, just sorts them, ( might not learn anything new with observation but won't be more confused) Video 8.7. Stocastic Environment Problem A robot with slipping wheels: sometimes the action doesn't change locations, sometimes it does (sucking always works perfectly) Actions can increase uncertainty, so belief state size increases: actions can have multiple outcomes -- that's what Stochastic means. Observations are still constant, so they partition belief state appropriately. In a Stochastic Partially Observable environment, actions tend to increase un-certainty observations tend to decrease it again Quiz: Planning in this environment -- note: No Observations, just Stochastic Actions Starting on left in image: 08class_stochasticEnv.jpg get to belief state where all squares are clean Various plans will Always or Maybe succeed, and is there some ideal plan which always or maybe achieves goal S R S -- M R S L S -- M S R R S -- M S R S R S -- M any -- M All are Maybes because we're not sure that we've moved anywhere... No finite plan is guaranteed to always work What about Infinite sequences of actions? Video 8.8. Infinite Sequences Instead of writing plans in a linear structure like: [S,R,S] Use tree structure, see image: 08class_infiniteSeq.jpg boxes are the states, circles are observations which can branch to other states due to looping the finite representation can show an infinite plan sequence using linear notation: [S, while A:R, S] Suck, while observing state A do Right, then Suck if the Stochasticity is independent (sometimes works, sometimes doesn't) with probability = 1, in the limit, the plan achieves the goal but can't state any bounded number of steps needed only say it's guaranteed at infinity Video 8.9. Finding a Successful Plan Plan can be found using Search, just like with trees before. With belief states we do the same search to goal process but the tree is more complicated. Try all paths until we find that portion of the tree which is the successful criteria of reaching the goal, that's the plan Video 8.10. Finding a Successful Plan Question Did the serach and found a branch that represents single plan What can we guarantee? to find goal in UNbounded number of steps -- do we need to guarantee: some leaf node is a goal? -- could get to leaf with nowhere to go every leaf is a goal? XXXX -- leaf nodes are the end points!!! or is there No possible guarantee? to find goal in BOUNDED number of steps -- do we need to have a plan that: has no branches? has no loops? XXXX -- loops make it un-bounded or is there No possible guarantee? Video 8.11. Problem Solving via Mathematical Notation Is the end state a goal state? [A,S,F] Result( Result( A, A->S ), S->F ) is an Element of Goals result of starting in A and doing A->S is the input to result of starting in (A->S) and doing S->F does that result in a at that is a Goal? In Stochastic Partially Observable worlds the equations are more complicated: instead of S' = Result( S, A ) new state Sprime is result of applying Action to initial State we are dealing with belief states rather than individual states B' = Update( Predict( B, A ), Observation ) new belief state B' is the result of Updating what we get from Predicting what our Action will do (in belief state B) based on our (new) Observation of the world Predict() gives the whole belief-state set resulting from Action in B Update() applies the Observation to the belief-state set to reduce it to a smaller, or same size, New B' belief-state Can use the Predict-Update Cycle to keep track of where we are in belief-states Video 8.12. Tracking the Predict Update Cycle see image: 08class_predictUpdate.jpg Example of P-U cycle, Actions are guaranteed (not Stochastic) but Dirt can be depositied in any location at any time (kindergarder world) Start in b1 belief-state: in A square with dirt, unknown B square Execute Suck Action Predict belief-state b2, because Suck always works: in A, clean A, unk B Observe [A,Clean] Update belief-state to b3, the same as Prediction Execute Right Action, lots of things can happen: B d/c, A could have new dirt Predict b4: in B square, any dirt config Observe [B,Dirty] Update belief-state to b5: in B, dirty B, A unk P-U cycle gives a calculus of belief states that can tell use everything we need to know. but one weakness, belief-state sets can get large so there are more susinct representations using variables for the 4 state thing can have: Vacuum-Location = Left,Right Location1-Dirt = T,F Location2-Dirt = T,F and some function over those variables can expand the description with the formula large belief-state sets can be made small in terms of the description Video 8.13. Classical Planning 1 notation for representation of states, actions, and plans and an approach for dealing with complexity by factoring the world into variables state space: all possible assignments to K boolean variables: 2^K states vacuum world: 3 variables dirtA, dirtB, VacA -> 8 states world state: complete assignment of single value to each variable belief state: in "core classical planning" had to be complete assignment useful for Determ FullyObs extend to partial assignments: VacA=true Dirt=unknown or an arbitrary boolean logic formula, can represent anything action schema: (represents many possible similar actions) represents a set of actions for possible variables says what we need to have in order to apply the Action and what will happen in terms of the variable transitions e.g., flying cargo around the world: Action( Fly( p, x, y ) Precond: Plane(p) /\ Airport(x) /\ Airport(y) /\ At(p,x) Effects: ~At(p,x) /\ At(p,y) ) Action Schema: Fly Plane from X to Y Precond: what needs to be true in order to execute Action p is a Plane, x and y are Airports, p should be At x Effects: what's going to happen when Action executes (Results) p is no longer At x, p is At y Looks like a term in Propositional Logic but is actually First Order Logic because it's a schema...can apply Schema to specific world states and then variables have specific values Video 8.14. Classical Planning 2 see image: 08class_cargoPlan.jpg More complete description actions to Load and Unload as well as Fly more Cargos and Planes has an initial state: Init and a goal state: Goal Video 8.15. Progression Search Forward or Progression state space search Do regular Search.... start in initial state branch on possible actions to make new states contine branching until hit a Goal predicate searching through the space of exact states, each state is an individual world state and if the actions are deterministic it's the same thing as before... but becasue we have this representation there are other possibilities now. Video 8.16. Regression Search Backwards or Regression Search, where we start at the Goal and work backwards take description of complete goal state, othe variables can be arbitrary search backwards by looking at arcs coming into goal state look at action schemas to find ones that result in current state find the variables for those actions which are known and unknown (Actions can be described in sets using specific variable values States can be uncertain due to some variables not being set) continue searching backwards until enough variables are filled in to match the initial state Video 8.17. Regression vs Progression see image: 08class_regressionSearch.jpg where regression makes sense buying a book Action Buy Goal is to own book with forward search we start with branches for every book and have to search for the one we want with backward search we start with the goal: Own(013...book) only one action schema Buy(b) can match goal in only one way, by buying the specified book and that links back to the Init state in just one way. Video 8.18. Plan Space Search Can search the space of Plans rather than states start with empty plan have Init and Goal modify play by adding new things look at all operators you have and pick one to add to plan and keep going, adding operators until you have a plan that is guaranteed to work was popular in 1980s but we do Forward search now because there are good heuristics, because it deals with concrete plan states it's easier to find good heuristics Video 8.19. Sliding Puzzle Example see image: one action: slide tile from A to B precond: tile has to be on location A, has to be a tile, B has to be blank, A and B are adjacent. effect: tile is on B, Blank is on A, tile is no-longer on A, Blank is no-longer on B with this representation a program can come up with heristics by relaxing the preconditions... throw out some precond to make the problem strictly easier: crossing out "B has to be blank" makes the Manhattan heuristic also crossing out "A and B are adjacent" makes the misplaced tile heuristic can also ignore negative effects: cross out "Blank is no-longer on B" makes it easier and "might" be good Video 8.20. Situation Calculus 1 another representation for planning say we want to move all cargo for A to B have to do each element with Propositional Logic but can use regular First Order Logic to express ALL with a set of conventions to represent states and actions: Actions are objects e.g. a function Fly(p,x,y) -- Fly plane from x to y Situations are objects that correspond to, not States, but to Paths of Actions in state space serach if you arrive at the same world state through different sets of Actions those are two different Situations Start at initial Situation zero -- S0, S' = Result( S, A ) a new Situation is the Result of an Action on a Situation Instead of describing Actions that can be taken from a Situation we talk about Actions that are possible from a state using a Predicate: Poss( A, S ) is an Action A possible in S State ...somehow Situations and States got overloaded here???.... possibilities are usually expressed as preconditons implying Poss(): SomePrecond(S) => Poss(A,S) some precondition of state S implied that it's possible to do action A in state S example: Plane(p,s) /\ Airport(x,s) /\ Airport(y,s) /\ At(p,x,s) => Poss( Fly(p,x,y), s ) (if) there is a plane in state s and airports x,y are in state s and the plane is at airport x in state s IMPLIES (then) it is possible to Fly p from x to y in state s (known as The Possiblity Axiom for the Action "Fly") Remember that a Situation is a PATH that got to a certain world-state but they seem to overload Situation and State in some places... Video 8.21. Situation Calculus 2 Convention in SC is that predicates like At(p,x,s) that can vary from one situation to another are called "Fluents" (they are fluid or change over time) and that they refer to a certain situation which is the last argument 's' The trickiest part is to describe what changes and what doesn't change as a result of an action (in classical planning the action schema described one action at a time and what changed) but in SC we do it the other way around: instead of for each action we describe each Fluent (for each Predicate that can change) using "Successor State Axioms" which describes what happens in the state that is the successor of executing an action. The general form is: Aa,s Poss(a,s) => ( fluent=true <=> a made-it-true V a didn't-undo-it ) for all actions and states if it's Possible to execute action from state then the fluent is true IFandONLYIF action a made it true or action a didn't undo it (either it wasn't true before and action did it or it was true before and action didn't break it) example, successor state axiom for the In() predicate (assume for-All variables) (note this seems to define the results of a Load() but is called In() predicate): Poss(a,s) => In( c, p, Result(s, a) ) <=> ( a=Load(c,p,x) V ( In(c,p,s) /\ a!=Unload(c,p,x) ) if it's possible to execute action a in state s then the In() predicate of cargo c, plane p, and the Results( in state s, of action a ) will hold IFandONLYIF a was a Load() action -- the Result of Load() is the cargo is in the plane OR cargo was already in the plane in situation s AND a was not an Unload action for all a and s in which it's Possible to execute action a in situation s the In() predicate holds IFandONLYIF the action was a Load() OR the In() predicate used to hold AND the action was not an Unload() Video 8.22. Situation Calculus 3 see image: 08class_sitCalc.jpg start with Initial State S0 with predicates: At(P1, JFK, S0) -- Plane1 is at JFK in the initial state Ac Cargo(c) => At(c,JFK,S0) -- for-All c, if c is Cargo THEN c is at JFK in situation S0 and Goal states: Es Ac Cargo(c) => At(c, SFO, s) -- there-Esists a goal state s, such that for-All c if c is Cargo THEN we want that cargo to be At SFO in state s so the Initial and goal states say move all the cargo from JFK to SFO anything can be asserted using Situation Calculus and it is expressed as normal first order logic so theorm provers can be used to solve them find a path that satisfies the descriptions But we still aren't doing Probable planning...up next!!!