Our aim is to design agents.
A rational agent is one that performs the actions that cause the agent to be most successful.
We use the term performance measure for the criteria that determine how successful and agent is. We will insist on an objective performance measure imposed by some authority.
Example Consider the case of an agent that is supposed to vacuum a dirty floor. A plausible performance measure would be amount of dirt cleaned in a certain period of time. A more sophisticated measure would include the amount of electricity consumed and amount of noise generated.
We need to be careful to distinguish between rationality and omniscience. If an agent is omniscient, it knows the actual outcomes of its actions. Rationality is concerned with expected success given what has been perceived. In other words, we cannot blame an agent for not taking into account something it could not perceive or for failing to take an action that it is not capable of taking.
What is rational at any given time depends on four things:
Ideal mapping from percept sequences to actions
For an ideal agent, we can simply make a table of the action it should take in response to each possible percept sequence. (For most agents, this would be an infinite table.) This table is called a mapping from the percept sequences to actions.
Specifying which action an agent ought to take in response to any given percept sequence provides a design for an ideal agent.
It is, of course, possible to specify the mapping for an ideal agent without creating a table for every possible percept sequence.
Example: The sqrt agent
The percept sequence for this agent is a sequence of keystrokes representing a number and an action is to display a number on a screen. The ideal mapping when a percept is an positive number x is to display a positive number z such that z2=x. This specification does not require the designer to actually construct a table of square roots.
Algorithms exist that make it possible to encode the ideal sqrt agent very compactly. It turns out that the same is true for much more general agents.
One more requirement for agents: Autonomy
If an agent's actions are based completely on built-in knowledge, such that it need pay no attention to its percepts, then we say that the agent lacks autonomy.
An agent's behaviour can depend both on its built-in knowledge and its experience. A system is autonomous if it's behaviour is determined by it's own experience.
It seems likely that the most successful agents will have some built-in knowledge and will also have the ability to learn.
The job of AI is to design the agent program: a function that implements the agent mapping from percepts to actions. We assume this program will run on some sort of computing device call the architecture.
The architecture makes the percepts from the sensors available to the agent program. runs the program and feeds the program's action choices to the effectors as they are generated.
Example A robot designed to inspect parts as they go by on a conveyer belt can make use of a number of simplifying assumptions: that the lighting will always be the same, that the only thing that will be on the conveyer belt are parts of a certain kind, and there are only two actions: accept and reject.
In contrast, some software agents (softbots) exist in rich unlimited domains, e.g., a robot designed to fly a 747 flight simulator or one designed to scan on-line news sources and show the interesting items to its customers. For the latter to do well, it must be able to process natural language, learn the interests of its customers and must be able to dynamically change its plans as news sources become available or unavailable.
All agent programs have roughly the same skeleton; they accept percepts from the environment and generate actions.
Each agent uses some internal data structure that is updated as new percepts arrive. These data structures are operated on by the agent's decision-making procedures to generate an action choice, which is then passed to the architecture for execution. Good data structures are often very important in AI.
Agents will receive only single percepts as input. It is up the agent to build up the percept sequence in memory if it so desires. In some environments it is possible to be quite successful without storing the percept sequence, and in complex domains, it is infeasible to store the complete sequence.
Why not do table lookup?
An agent that can play chess would need a table with about 35100 entries.
Furthermore, such an agent has no autonomy at all because the calculation of best actions is entirely built-in. If the environment changed in some way, the agent would be entirely lost.
Learning in the context of such a large table is hopeless.
Example: The taxi driver agent
The full taxi driver task is extremely open-ended - there is no limit to the novel situations that can arise.
We start with percepts, actions, and goals
The taxi driver will need to know where it is, what else is on the road and how fast it is going. This information can be obtained from the percepts provided by one or more controllable cameras, a speedometer and odometer. To control the vehicle properly it should have an accelerometer. It will need to know the state of the vehicle, so it will need the usual arrays of engine and electrical sensors. It might also have instruments like a GPS to give exact position with respect to an electronic map. It might also have infrared or sonar sensors. Finally, it will need some way for the customer to communicate destination.
The actions will include control over the engine through the accelerator pedal and control over steering and braking. Some way of talking to passengers and perhaps some way to communicate with other vehicles.
Getting to the correct destination, minimizing fuel consumption and wear and tear, minimizing trip time and cost, minimizing traffic violations and disturbances of other drivers, maximizing safety and passenger comfort. Some of these goals conflict and so there will be trade-offs.
City streets? highways? snow and other road hazards? driving on right or left?
The more controlled the environment, the easier the problem.
We will now consider four types of agent programs:
Constructing a lookup table is out of the question. The visual input from a simple camera comes in at the rate of 50 megabytes per second, so the lookup table for an hour would be 260x60x50M. However, we can summarize certain portions of the table by noting certain commonly occurring input/output associations. For example, if the car in front brakes, then the driver should also brake.
In other words some processing is done on visual input to establish the condition, "brake lights in front are on" and this triggers some established connection to the action "start braking". Such a connection is called a condition-action rule written as
Agents that keep track of the world
Simple reflex agents only work if the correct action can be chosen based only on the current percept.
Even for the simple braking rule above, we need some sort of internal description of the world state. (To determine if the car in front is braking, we would probably need to compare the current image with the previous to see if the brake light has come on.)
From time to time the driver looks in the rear view mirror to check on the location of nearby vehicles. When the driver is not looking in the mirror, vehicles in the next lane are invisible. However, in order to decide on a lane change requires that the driver know the location of vehicles in the next lane.
This problem illustrates that for any complex domain, sensors do not provide access to the complete world state. In such domains, the agent must maintain an internal state that it updates as new sensor information becomes available.
Updating the state requires the agent to have two kinds of information: First, it needs information about how the world changes over time. Second, it needs information about how its actions effect the world.
Knowing about the state of the world is not always enough for the agent to know what to do next. For example, at an intersection, the taxi driver can either turn left, right, or go straight. Which turn it should make depends on where it is trying to get to: its goal.
Goal information describes states that are desirable and that the agent should try to achieve.
The agent can combine goal information with information about what it's actions achieve to plan sequences of actions that achieve those goals.
Search and Planning are the sub fields of AI devoted to finding action sequences that achieve goals.
Decision making of this kind is fundamentally different from condition-action
rules, in that it involves consideration of the future. In the reflex agent
design this information is not used because the designer has precomputed
the correct action for the various cases. A goal-based agent could reason
that if the car in front has its brake lights on, it will slow down. From
the way in which the world evolves, the only action that will achieve the
goal of not hitting the braking car is to slow down. To do so requires
hitting the brakes. The goal-based agent is more flexible but takes longer
to decide what to do.
Goals alone are not enough to generate high-quality behaviour. For example, there are many action sequences that will get the taxi to its destination, but some are quicker, safer, more reliable, cheaper, etc.
Goals just provide a crude distinction between "happy" and "unhappy" states whereas a more general performance measure should allow a comparison of different world states. "Happiness" of an agent is called utility.
Utility can be represented as a function that maps states into real numbers. The larger the number the higher the utility of the state.
A complete specification of the utility function allows rational decisions in two kinds of cases where goals have trouble. First, when there are conflicting goals, only some of which can be achieved (e.g., speed vs. safety), the utility function specifies the appropriate trade-off. Second, when there are several goals that the agent can aim for, none of which can be achieved with certainty, utility provides a way in which the likelihood of success can be weighed up against the importance of the goals.
Properties of environments
The following pseudo-code illustrates the basic relationship between agents and environments. The environment simulator takes one or more agents as input and arranges to repeatedly give each agent the right percepts and receive back an action. The simulator then updates the environment based on the actions and possibly other dynamic processes in the environment that are not considered to be agents.
The environment is therefore defined by an initial state and an update function. Obviously, an agent that works in a simulated environment ought also to work in the real thing.
Run-Eval-Environment returns the performance measure for a single environment defined by a single initial state and a particular update function. Usually, an agent is designed to work in an environment class, a whole set of different environments. So, in order to measure the performance of an agent, we need an environment generator that selects particular environments in which to run the agent. We are then interested in average performance.
A possible confusion arises between the state variable in the environment simulator and the state variable in the agent itself. These must be kept strictly separate!