Unit 8. Planning

Video 8.1. Introduction

 AI -- finding appropriate actions for an agent,
  so planning is the core

 A* search can find a path given a state space description
  but only works when environment is determinsistic and fully observable


Video 8.2. Problem Solving vs Planning
 In the real world, finding a path and then executing it without sensors
  is "not pretty". People can't keep on a straight path without feedback...
  Example of hikers with limited vision:
   on cloudy day they walk in circles
   on sunny day they can see shadows and keep relatively straight

 Need feed back from environment,
  need to interleave Planning and Executing.


Video 8.3. Planning vs Execution

 Environment is Stochastic -- we don't know exactly what an action will do
  ...need to deal with contingincies...
  like traffic lights

 Multi-Agent environments, need to react to other agents at execution time

 Partial Observability -- even with a good plan, sometimes some necessary
  information is not available until later in the plan. Or may not know the
  exact starting state.

 Unknown world model -- inaccurate maps, etc

 Heirarchical plans -- plan may be high level
   but involve lots of lower level actions to execute in detail

 
 Most of the problems can be addressed by changing from "world view"
  to "belief state" point of view 


Video 8.4. Vacuum Cleaner Example
 see image:  04week_HW_Q4_7.jpg  vacuum environment states

 8 states, vacuum, dirt, locations
 actions:  move right, move left, suck

 In Deterministic and Observable environment, planning is simple...
  say start in left,dirt,dirt state --
    suck, move right, suck, and done...

 If robot sensors break, can't perceive its location or dirty state:
  now have Un-observable world -- could be in any one of the eight states
  so belief is ANY state. Can then execute actions and explore states.


Video 8.5. Sensorless Vacumm Cleaner Problem
 see image:  04week_HW_Q3.jpg  vacuum belief states

 If we execute actions we can gain knowledge about the world even w/o sensing.

 Plans that reach goal without sensing are called "Conformant Plans"

 Start with top eight-state belief
 Move right, then we know we are in right-hand location -- right four-state --
   either moved from left to right, or bumped against the wall and stayed.
  Note that in belief state world, right and left moves are not exact inverses
   because going left from the right-most belief state goes to the left-most
   belief state set.
 
 Can suck dirt in Right, then move Left, then Suck,
  and end in known state Left, both clean.

 Quiz:
  How do I get from
   the state where I know my current square is clean but nothing else
    to the belief state where
   I know I'm in the Right-side location and I know that that location is clean?

   Start in four-state in middle (unknown location, but clean where V is)
     (note: do NOT click the start state... the boxes are ACTIONS!!) <****!!!
   Move Right
   Suck


Video 8.6. Partially Observable Vacuum Cleaner Example

 Local Sensing -- Vacuum can see what location it's in and
                   dirt in current location, but not dirt in other locations.

  Start in Left, have two possible world states
  Perform Right-Move Action, still have two possible world states
  Get a Percept -- Act -> Percept cycle --
                 the observation splits world into two different states

 What does the Act-Observe cycle do to the sizes of the belief state?
  In deterministic world each state-action moves to exactly one other,
  but with belief states actions may keep the size the same or decrease it,
   with observations we can then partition the belief state into smaller bits
   (note: observation doesn't introduce new possible states, just sorts them,
   ( might not learn anything new with observation but won't be more confused)


Video 8.7. Stocastic Environment Problem

 A robot with slipping wheels:
   sometimes the action doesn't change locations, sometimes it does
   (sucking always works perfectly)

 Actions can increase uncertainty, so belief state size increases:
  actions can have multiple outcomes -- that's what Stochastic means.

 Observations are still constant, so they partition belief state appropriately.

 In a Stochastic Partially Observable environment,
  actions tend to increase un-certainty
  observations tend to decrease it again

 Quiz:
  Planning in this environment -- note: No Observations, just Stochastic Actions
   Starting on left in image: 08class_stochasticEnv.jpg
    get to belief state where all squares are clean

  Various plans will Always or Maybe succeed,
   and is there some ideal plan which always or maybe achieves goal

     S R S      -- M
	 R S L S	-- M
	 S R R S	-- M
	 S R S R S	-- M
	 any		-- M

 All are Maybes because we're not sure that we've moved anywhere...
 No finite plan is guaranteed to always work
 What about Infinite sequences of actions?


Video 8.8. Infinite Sequences

 Instead of writing plans in a linear structure like: [S,R,S]
 Use tree structure, see image:  08class_infiniteSeq.jpg
   boxes are the states,
   circles are observations which can branch to other states
   due to looping the finite representation can show an infinite plan sequence

  using linear notation: [S, while A:R, S]
    Suck, while observing state A do Right, then Suck

  if the Stochasticity is independent (sometimes works, sometimes doesn't)
   with probability = 1, in the limit, the plan achieves the goal
    but can't state any bounded number of steps needed
     only say it's guaranteed at infinity


Video 8.9. Finding a Successful Plan

 Plan can be found using Search, just like with trees before.
 With belief states we do the same search to goal process
  but the tree is more complicated.

 Try all paths until we find that portion of the tree which is
  the successful criteria of reaching the goal, that's the plan 

Video 8.10. Finding a Successful Plan Question
Did the serach and found a branch that represents single plan
What can we guarantee?

 to find goal in UNbounded number of steps --
  do we need to guarantee:
   some leaf node is a goal?             -- could get to leaf with nowhere to go
   every leaf is a goal?            XXXX -- leaf nodes are the end points!!!
   or is there No possible guarantee?

 to find goal in BOUNDED number of steps --
  do we need to have a plan that:
   has no branches?
   has no loops?                    XXXX -- loops make it un-bounded
   or is there No possible guarantee?
 

Video 8.11. Problem Solving via Mathematical Notation

 Is the end state a goal state? [A,S,F]
   Result( Result( A, A->S ), S->F )  is an Element of Goals
    result of starting in A and doing A->S
	 is the input to
    result of starting in (A->S) and doing S->F
	does that result in a at that is a Goal?

 In Stochastic Partially Observable worlds the equations are more complicated:
   instead of   S' = Result( S, A )
       new state Sprime is result of applying Action to initial State
	we are dealing with belief states rather than individual states
	            B' = Update( Predict( B, A ), Observation )
	    new belief state B' is the result of Updating what we get
		 from Predicting what our Action will do (in belief state B)
		 based on our (new) Observation of the world
       Predict() gives the whole belief-state set resulting from Action in B
	   Update() applies the Observation to the belief-state set to
	    reduce it to a smaller, or same size, New B' belief-state

   Can use the Predict-Update Cycle to
     keep track of where we are in belief-states


Video 8.12. Tracking the Predict Update Cycle
  see image:  08class_predictUpdate.jpg

 Example of P-U cycle,
  Actions are guaranteed (not Stochastic)
  but Dirt can be depositied in any location at any time (kindergarder world)

  Start in b1 belief-state: in A square with dirt, unknown B square
  Execute Suck Action
  Predict belief-state b2, because Suck always works: in A, clean A, unk B
  Observe [A,Clean]
  Update belief-state to b3, the same as Prediction
  Execute Right Action, lots of things can happen: B d/c, A could have new dirt
  Predict b4: in B square, any dirt config
  Observe [B,Dirty]
  Update belief-state to b5: in B, dirty B, A unk

 P-U cycle gives a calculus of belief states
   that can tell use everything we need to know.
  but one weakness, belief-state sets can get large
  so there are more susinct representations using variables
   for the 4 state thing can have:
   	Vacuum-Location = Left,Right
	Location1-Dirt  = T,F
	Location2-Dirt  = T,F
   and some function over those variables can expand the description
   with the formula large belief-state sets can be made small
    in terms of the description


Video 8.13. Classical Planning 1

 notation for representation of states, actions, and plans and
  an approach for dealing with complexity by factoring the world into variables

 state space: all possible assignments to K boolean variables: 2^K states
    vacuum world: 3 variables dirtA, dirtB, VacA -> 8 states

 world state: complete assignment of single value to each variable

 belief state: in "core classical planning" had to be complete assignment
                useful for Determ FullyObs
			 extend to partial assignments: VacA=true Dirt=unknown
			 or an arbitrary boolean logic formula, can represent anything

 action schema: (represents many possible similar actions)
  represents a set of actions for possible variables
  says what we need to have in order to apply the Action and
  what will happen in terms of the variable transitions

   e.g., flying cargo around the world:

  Action( Fly( p, x, y )
   Precond: Plane(p) /\ Airport(x) /\ Airport(y) /\ At(p,x)
   Effects: ~At(p,x) /\ At(p,y) )

	Action Schema: Fly Plane from X to Y
	 Precond: what needs to be true in order to execute Action
		 p is a Plane, x and y are Airports, p should be At x
	 Effects: what's going to happen when Action executes (Results)
	     p is no longer At x, p is At y

 Looks like a term in Propositional Logic but is actually
  First Order Logic because it's a schema...can apply Schema
  to specific world states and then variables have specific values


Video 8.14. Classical Planning 2
   see image:  08class_cargoPlan.jpg

 More complete description
   actions to Load and Unload as well as Fly
   more Cargos and Planes
   has an initial state: Init
   and a goal state:  Goal


Video 8.15. Progression Search

 Forward or Progression state space search

  Do regular Search....
   start in initial state
   branch on possible actions
   to make new states
   contine branching
   until hit a Goal predicate

 searching through the space of exact states,
  each state is an individual world state
  and if the actions are deterministic it's the same thing as before...
  but becasue we have this representation there are other possibilities now.


Video 8.16. Regression Search

 Backwards or Regression Search, where we start at the Goal and work backwards

 take description of complete goal state, othe variables can be arbitrary
 search backwards by looking at arcs coming into goal state
 look at action schemas to find ones that result in current state 
 find the variables for those actions which are known and unknown
  (Actions can be described in sets using specific variable values
   States can be uncertain due to some variables not being set)
 continue searching backwards until enough variables are filled in
   to match the initial state


Video 8.17. Regression vs Progression
  see image: 08class_regressionSearch.jpg

where regression makes sense
 buying a book
 Action Buy
 Goal is to own book

 with forward search we start with branches for every book
  and have to search for the one we want
 with backward search we start with the goal: Own(013...book)
 only one action schema  Buy(b)
 can match goal in only one way, by buying the specified book
 and that links back to the Init state in just one way.


Video 8.18. Plan Space Search

 Can search the space of Plans rather than states

 start with empty plan have Init and Goal
 modify play by adding new things
  look at all operators you have and pick one to add to plan
  and keep going, adding operators
  until you have a plan that is guaranteed to work

 was popular in 1980s but we do Forward search now because there are
  good heuristics, because it deals with concrete plan states it's
  easier to find good heuristics


Video 8.19. Sliding Puzzle Example
  see image:  

 one action: slide tile from A to B
  precond: tile has to be on location A, has to be a tile,
           B has to be blank, A and B are adjacent.
  effect: tile is on B, Blank is on A,
          tile is no-longer on A, Blank is no-longer on B

 with this representation a program can come up with heristics
  by relaxing the preconditions...
  throw out some precond to make the problem strictly easier:
   crossing out "B has to be blank" makes the Manhattan heuristic
   also crossing out "A and B are adjacent" makes the misplaced tile heuristic
  can also ignore negative effects:
   cross out "Blank is no-longer on B" makes it easier and "might" be good


Video 8.20. Situation Calculus 1

 another representation for planning

 say we want to move all cargo for A to B
  have to do each element with Propositional Logic
  but can use regular First Order Logic to express ALL
   with a set of conventions to represent states and actions:

 Actions are objects e.g. a function Fly(p,x,y) -- Fly plane from x to y

 Situations are objects that correspond to, not States, but
  to Paths of Actions in state space serach
  if you arrive at the same world state through different sets of Actions
  those are two different Situations
 
 Start at initial Situation zero -- S0,
 	S' = Result( S, A )
	 a new Situation is the Result of an Action on a Situation

 Instead of describing Actions that can be taken from a Situation
   we talk about Actions that are possible from a state using a Predicate:
    Poss( A, S )
	 is an Action A possible in S State
		...somehow Situations and States got overloaded here???....
   possibilities are usually expressed as preconditons implying Poss():
	SomePrecond(S) => Poss(A,S)
	 some precondition of state S implied that
	   it's possible to do action A in state S
   example:
    Plane(p,s) /\ Airport(x,s) /\ Airport(y,s) /\ At(p,x,s) =>
                                                         Poss( Fly(p,x,y), s )
	 (if) there is a plane in state s and airports x,y are in state s
	  and the plane is at airport x in state s  IMPLIES (then)
	  it is possible to Fly p from x to y in state s
	  (known as The Possiblity Axiom for the Action "Fly")

Remember that a Situation is a PATH that got to a certain world-state
but they seem to overload Situation and State in some places...


Video 8.21. Situation Calculus 2

 Convention in SC is that predicates like At(p,x,s) that can vary
  from one situation to another are called "Fluents" (they are fluid
  or change over time) and that they refer to a certain situation
  which is the last argument 's'

 The trickiest part is to describe what changes and what doesn't change
  as a result of an action (in classical planning the action schema 
  described one action at a time and what changed) but in SC we do it
  the other way around: instead of for each action we describe each
  Fluent (for each Predicate that can change) using "Successor State Axioms"
  which describes what happens in the state that is the successor of
  executing an action. The general form is:

    Aa,s Poss(a,s) =>
	 ( fluent=true <=> a made-it-true V a didn't-undo-it )

	for all actions and states
	 if it's Possible to execute action	from state
	  then 
	 the fluent is true IFandONLYIF action a made it true
	   or
	 action a didn't undo it
	  (either it wasn't true before and action did it or
	   it was true before and action didn't break it)

   example, successor state axiom for the In() predicate (assume for-All variables)
    (note this seems to define the results of a Load() but is called In() predicate):
      Poss(a,s) => In( c, p, Result(s, a) ) 
	    <=>
	   ( a=Load(c,p,x) V ( In(c,p,s) /\ a!=Unload(c,p,x) )

	if it's possible to execute action a in state s then
	 the In() predicate of cargo c, plane p, and
	                       the Results( in state s, of action a )
	  will hold IFandONLYIF
	   a was a Load() action -- the Result of Load() is the cargo is in the plane
	    OR
	   cargo was already in the plane in situation s AND a was not an Unload action

	  for all a and s in which it's Possible to execute action a in situation s
	   the In() predicate holds
	    IFandONLYIF
	     the action was a Load()
	      OR
		 the In() predicate used to hold AND the action was not an Unload()


Video 8.22. Situation Calculus 3 
  see image: 08class_sitCalc.jpg

 start with Initial State S0 with predicates:
   At(P1, JFK, S0)
     -- Plane1 is at JFK in the initial state

   Ac  Cargo(c) => At(c,JFK,S0)
     -- for-All c, if c is Cargo THEN c is at JFK in situation S0

 and Goal states:
   Es Ac Cargo(c) => At(c, SFO, s)
     -- there-Esists a goal state s, such that for-All c if c is Cargo
	    THEN
		we want that cargo to be At SFO in state s

 so the Initial and goal states say move all the cargo from JFK to SFO
 
 anything can be asserted using Situation Calculus
 and it is expressed as normal first order logic
 so theorm provers can be used to solve them
 find a path that satisfies the descriptions

 But we still aren't doing Probable planning...up next!!!