|
Project > Proposal
Introduction: The testbed for this project involves science education,
using the Lego Mindstorms construction materials with children, oversampled
for females and underserved populations.
The main focus of this project will be on developing Module 5, Learning
and Inferring Action Decisions, which includes the dialogue controller,
and on examining the effects of the resulting proactive computing on human
reactions and behavior. However, this requires the presence of the other
components of the system, as well, in order to have a full system. To
accomplish this, we will take the state detectors developed prior to and
early in the project and integrate them into a full, though initially
very limited, system. Only by having a simple but complete proactive computer
system can we begin to deal with the technical problems of integration
and the psychological issues involved in working with such a system. As
more sophisticated sensors, state detectors and computer action synthesis
modules are developed, they will be integrated into the system, and the
action decision and dialogue modules will be improved.
Testbed and participants: The testbed we have chosen is that of
Lego-Logo/Mindstorms (LL/M). Lego-Logo is a learning environment that
enables children to build working machines out of specialized Lego equipment
(including lights, sensors, and motors) and control them using computers.
Mindstorms extends the Lego-Logo environment with a GUI and advanced robotics
equipment that allows students to build and program sophisticated designs
[185,186,187,188,189,190]. It includes an on-board computer processor
into which programs can be downloaded from a PC via IR transmission. The
processor can control multiple motors and has connections for multiple
sensors, including light, touch and rotation sensors (others are being
developed by the company). These can be built right into a construction,
positioned as needed. There are several reasons for choosing this testbed.
· Computer coaching: There are many situations in education,
industry and military in which it would be desirable to have a computer
that can 'coach' people in tasks involving real-world materials, whether
for training, equipment maintenance or construction. This is a high-priority
application area. The LL/M environment provides a prototype environment
for these types of applications.
· Hands-on science education: Encouraging children in their
learning of science is a national priority, especially for females
and underrepresented groups, who traditionally have had less interest
and success.
· Rich, yet constrained, environment: Construction elements
and structures and the programming are well defined, yet the combinatorial
possibilities are virtually endless. Tasks can range from simple and
very structured to highly complex and creative, and can be produced
for an enormous range of skill levels.
· Emotionally rich: It is an environment in which some children
become energetically engaged and emotionally expressive while other
children are timid and hesitant in initiating task activity.
The project will focus on using proactive computing to achieve two
ends. First, helping children to begin work in the LL/M environment;
particularly, enticing children to participate who are timid, fearful
or generally disinterested in science or construction activities,
with the goal of helping them gain a vision of the benefits of these
activities, together with a basic skill level, that will sustain a
continued interest. Second, encouraging creativity and perseverance
in children who already feel comfortable with the LL/M environment,
developing a willingness to explore further possibilities and to accept
greater challenges. Thus, we will seek ways for a proactive computer
to recognize and reduce fear and frustration, as well as to gently
guide skill learning and issue challenges for independent initiative.
These are general goals of excellent hands-on science initiatives.
Children of age 10 to 12, as well as some college undergraduates,
will be the subjects, with an oversampling of females and of children
from underrepresented groups. The Don Moyers Boys' and Girls' Club
is just 5 blocks from Beckman Institute, providing access to many
minority group children. Denisha Tate, the director of research for
the Club, has agreed to a cooperation between the Club and our project.
In addition, one of the graduate research assistants who will work
on the project, Dean Grosshandler, conducts after-school children's
classes using LL/M materials. His present and past students constitute
a set of experienced users.
The physical arrangement will consist of a child sitting at a table
with a computer monitor about 70 cm in from the edge of the table.
Immediately before her will be a tray of LL/M parts needed for the
current task. Instructions and messages can be presented visually
on the monitor or auditorially. Video cameras will be directed toward
the child, including an active camera focused on the face to record
facial expressions, and a wide-view camera giving information about
hand gestures and other body movements. Additional cameras will be
directed down from above at the LL/M workspace, giving information
about the current state of the task. A throat microphone will record
the child's vocalizations. An eyetracker and head-tracking apparatus
will record eye and head movements. A clock system will be used to
provide a synchronizing signal on the various data sources so that
they can be coordinated in later analyses, including examination of
multiple videotapes in synch with audio and eye movement information.
This environment will be developed in Beckman Institute's Integration
Support Lab where the integration of diverse equipment is occurring.
In initial observational studies, a tutor/experimenter will sit next
to the child, with video camera and microphone placement that allows
a video and audio record of their interactions as well as of the child's
behavior (body movement, gestures, etc.).
Developing a corpus for identifying user states and computer actions,
and for computer learning: Video and audio recordings will be made
of 40 children of different skill levels, as they carry out LL/M construction
and programming tasks. Instructions, in the form of CAD-type displays
and/or printed instructions, will be presented on the computer screen.
Tasks will be selected that are appropriate for the skill levels of the
different children. These will fall into three classes: following a CAD
diagram to build an object (guided construction task), building an object
to meet some specified goal (goal-specified construction task), or building
an object of their own design (free construction task). In the first two
cases, tasks will be chosen that exemplify a principle of science or engineering
or of programming, such as levers for increasing lift power, or a washing
machine with internal parts that turn only when the door is closed, using
touch or light sensing. For 15 of the children, an experienced tutor (Grosshandler)
will be sitting by their side, giving suggestions and encouragement as
appropriate. A complete video and audio record, including eye movement
recording, will be made.
A detailed analysis of the recordings of a few children will be conducted
by expert pschologists and educators to identify emotional, motivational
and cognitive states and behavioral indicators of those states. In
addition, types of tutor actions will be identified, as well as key
task events (completion of substructures and structures, errors and
their correction, etc.). Once the basic categories have been established
from the analysis of these records, undergraduate students will be
trained to conduct these analyses. The remainder of the videotaped
records will then be analyzed to identify and label periods in which
mental and task states can be identified, and tutor actions. In addition,
experienced tutors will view the tapes and indicate other potentially
useful tutor actions that could have been taken at various points
in time. The resulting labeled corpus, called the base corpus, is
crucial for the rest of the project. It will be used to train computer
models for state detection and for inferring appropriate computer
actions. The corpus will also be made publicly available for use by
other investigators. We suspect that it will be widely used in related
projects.
Recognition and tracking of human states: Prior research by team
members has led to significant progress in tracking faces, facial expressions
and body movement (including hands gestures), recognizing faces, analyzing
the speech signal for affective indicators, and using eye movement indicators
to detect confusion. Research on human motion tracking and understanding
will continue in this project, particularly using the base corpus.
Facial expression recognition and head tracking: This research involves
reliable tracking of the subject's face, extracting the features used
by the classifier. We have developed a reliable non-rigid facial motion-tracking
algorithm [191]. The outputs of this are the Action Units representing
shape and motion of facial features including lips, eyebrows and cheeks.
Using these features, a classifier has been constructed to classify facial
expressions. Chen [41] has shown that the classification can be done with
good accuracy with a small number of major emotions even on a single frame;
we will extend this to classification using temporal cues. This can be
achieved by a number of methods, such as Hidden Markov Models and time
warping using Dynamic Programming algorithms. These methods also provide
a record of the subject's head movements.
Body tracking research will focus on both tracking the person as a blob
and on tracking of body parts such as arms and hands. We have already
developed algorithms for head/body/limb tracking using range data [192]
and hand/face tracking using adaptive color segmentation [193]. In our
proposed research, we shall take an integrated approach where the human
upper body (head, torso, arms/hands) is modeled as an articulated object
where each part is represented by a deformable object [194]. Based on
data taken by multiple video cameras, the parameters of the model (joint
angles, surface shape parameters) will be estimated as functions of time.
Deformable surfaces are needed in order to recognize shoulder movement,
shrugging, twisting, etc. The challenge lies in finding suitable deformable
models for the upper human body, and algorithms for estimating the model
parameters. The algorithms will rely heavily on statistical learning techniques
including switching state-space models, dynamic Bayesian networks, and
self-organizing maps.
Voice Analysis: For the purposes of this project we will need a
bi-directional speech communication between the subject and the system.
This will require both speech recognition and text-to-speech synthesis.
Neither of these is a solved problem but the state-of-the-art is sufficient
for the construction of a useful experimental platform. We intend to use
the IBM "Via-Voice" system for speech recognition and the Bellcore
"Orator" system for synthesis. The technology for both of these
commercial products is well known. For the speech recognition part, the
method of choice is hidden Markov modeling [195]. For synthesis, it is
concatenative LPC [196]. Analysis of prosodic features is also well known
and will follow the autocorrelation technique [197]. Higher level linguistic
analysis will follow the method described in Levinson and Shipley [198].
We will also analyze the speech signal in order to detect mental states.
Most of the information about mental state is reflected in the prosodic
features of speech, namely pitch, energy and duration. Physical correlates
of these features can be extracted from the signal quite reliably without
the need to actually determine what words were spoken. Pitch is estimated
from extraction of the fundamental frequency, energy from the signal envelope
and duration from syllabic rate as determined by counts of peaks in the
energy contour. Since speech recognition is required for other aspects
of this project, we can use the syntactic and semantic analysis to provide
additional information about mental state, using specific words and phrase
structures are indicators of mental state.
Eyetracking: Initial research will be directed at identifying eye
movement indicators of various emotional and cognitive states. Episodes
will be identified in the base corpus in which a child is confused, when
she is progressing without difficulty, when she is daydreaming, when she
is planning her next steps, or is in other identifiable states. Analyses
will seek to identify eye movement patterns that are typical of these
different states.
In addition, children will be asked to perform different cognitive tasks
in the LL/M environment (read instructions carefully, scan textual instructions
quickly, scan a Lego CAD diagram, examine a Lego CAD diagram carefully,
identify the next component needed from a Lego CAD diagram, search for
a particular building component, check a computer program for consistency,
determine which command to include next in a program, etc.). Again, analyses
will be aimed at identifying properties of the eye movement record that
tend to accompany each type of cognitive activity. Discrimination methods
involving patterns of eye movements are exemplified in Althoff [81] and
Yang& McConkie [82].
Based on these findings, eye movement pattern detectors will be developed
and integrated into the proactive computer system.
Fusion: In the above subsections, research in visual face and body
tracking, voice analysis, and eye movement are discussed separately. A
major aim of our proposed research as a whole is to combine cues from
these different modalities so that the emotional and cognitive state of
a person could be assessed more accurately than is possible when only
cues from a single modality are available [199]. We propose to use a probabilistic
framework. In particular, various architectures of Dynamic Bayesian networks
[200] will be explored. Obviously, this research will depend critically
on the basic corpus where user appearance is related to user state.
Tracking task state: Computer vision techniques will be used to
monitor the state of a user's assembly task. While easily obtained measures
such as the height of a construction may provide sufficient feedback to
the proactive system for certain tasks, our longer term objective for
this subsystem is to track the Lego parts and to continually determine
the configuration of a partially assembled project. One can view the process
as reverse engineering, taking image data as input and producing Computer
Aided Design (CAD) models. Since there is a modest number of different
Lego parts, each of which is monochromatic and has a well-defined geometry,
a combination of well-understood color image segmentation and model-based
vision techniques can be used to recognize and track isolated parts from
monocular, binocular or trinocular image data [12,86,87,96,88,89,90].
Our approach will exploit complete geometric models of each part annotated
with appearance-based information, particularly color. An overhead
commercial trinocular vision system (Triclops) will provide a depth
map of the entire workspace and be used to locate isolated parts and
assemblies; however, there is insufficient resolution to completely
determine an assembly's structure. To provide greater visual coverage
in the presence of occlusion, multiple color video cameras will observe
the scene, and we will develop algorithms for estimating and tracking
the assembly's structure from one or more video streams.
There are numerous challenges and opportunities in this context because
assemblies are composed of numerous parts, parts with the same color
may mate flushly obscuring the interpart boundaries, and parts may
be partially or wholly occluded. The importance and challenge of exploiting
part decompositions for object recognition was one the main conclusions
of the "1995 NSF/ARPA Workshop on 3D Object Representations in
Computer Vision," yet very little research or progress has ensued,
in part due to successes of appearance-based methods [201]. Unfortunately,
appearance-based methods are most effective when there is limited
parametric variability in viewpoint, articulation, shape deformation
or lighting, and here we are directly confronting structural variability.
The construction process may involve the addition or deletion of a
single part or the mating or separation of sub-assemblies. By tracking
the state of assemblies and observing which new part is grasped from
a part pallet, kinematic constraints restrict the possible location
of the new part with respect to the current assembly. These kinematic
constraints might simply be a finite set of locations (two basic blocks
can only be attached together in a small number of ways) or they may
involve revolute or prismatic joints with continuous degrees of freedom.
In general, the number of configurations is exponential in the number
of parts, but since assemblies are constructed incrementally, we will
not have to consider the entire state space. Nonetheless when the
vision system cannot unambiguously determine the result of an operation,
a hypothesis space must be maintained, and it too may grow exponentially.
As further parts are added or as an assembly is rotated to provide
a different view, we believe that this hypothesis space can be effectively
pruned. Depending upon the task, (e.g. building a specified structure
or freeform building to meet a specified design objective), we will
develop mechanisms to compare an observed assembly to the specified
goal. This will require establishing meaningful metrics between a
measured state and a goal state, or between the set of hypothesized
states and the goal.
Identifying candidate computer actions: The base corpus video/audio
recordings in which a tutor was present will be analyzed to identify the
tutor's actions and the conditions under which they occur, in attempting
to assist the child. We anticipate that these actions (mostly verbal,
but also with gestures) will fall into six categories: social (not directly
related to the task, such as greetings, jokes, and various side comments),
control (messages intended to set limits on a child's behavior and to
encourage her to abide by them; attention-attracting actions), affective
(expressing interest, giving assurance or encouragement, praising), procedural
(instructions for specific actions to take and how to carry them out),
exemplar (presenting an example of some construction, usually a substructure,
as a way of helping the child to see how to progress; presenting examples
of objects that might be constructed; presenting examples of computer
command sequences to produce different action patterns for computerized
objects), or question (posing a question to help a child recollect past
knowledge, or to help direct her thoughts).
Tutor actions in the base corpus will be labeled according to their
type. These data will provide a basis for computer learning algorithms
to identify appropriate conditions for different action types. In
addition, a collection of broadly-applicable actions (comments, displays,
sounds) of each type will be compiled that can be used by a proactive
computer in its communications. Frames for narrowly-applicable actions
will be identified, together with their conditions, and these will
be incorporated into the dialogue controller.
Develop models to guide computer action: We will investigate
and study the relative merits of two conceptually different approaches
to developing a system to guide computer actions. One is a direct classification
method that attempts to decide on the best action to use without an intermediate
step of density estimation and the other develops a probability distribution
over predetermined states and uses it to select a course of action. Beyond
that, the two approaches differ in the training policy (type of feedback
given to the learning program) as well as in the amount of manual intervention
required when building the system.
The first approach we will study is a direct approach that attempts
to learn a mapping directly from sensors to actions, using the data
from the base corpus described above. For this approach we will develop
a model for supervised learning of action strategies in dynamic stochastic
domains, and learn strategies represented by (generalized) rule-based
systems. In this model the learning program will be given access to
traces of good behavior, namely, of a human expert observing a student,
and will attempt to learn a strategy for behaving successfully in
similar situations. This general direction has roots in works on learning
to reason [202,203] and, more specifically, in works on learning to
take actions [204,205]. Technically, this framework is based on the
PAC model of learning from examples [206] but is applied here to problems
in which a program acts and needs to achieve goals. The formalization
studied considers stochastic partially observable worlds as in reinforcement
learning [207] where the state is described using relational information.
This general framework has been studied theoretically and has been
quite successful experimentally in several domains see Khardon [208].
The main challenges in applying a framework of this sort in our domain
are that it is significantly more complex, both in terms of input
dimensionality and expected functional complexity of the actions.
We are planning to address these in two ways that involve representational
and computational issues. First, the basic action strategies will
be presented as rules of the form C --> A, were A is one of several
actions, and C is a condition (potentially an existentially quantified
expression) that is expressed as a simple function of some sensor
measurements along with state variables. Our representation for C
would be that of a generalized rule [209] that can be learned more
efficiently from examples, even in the presence of incomplete information
[203,210, 211] and of very large input dimensionality that is composed
mostly of irrelevant variables, as is the case when interacting with
real world sensory data. A secondary advantage of this representation
is that it can be manually initialized and/or augmented by experts
to facilitate building a quick prototype. The second way we will address
the challenge is by studying more complex action strategies that are
composed hierarchically. The basic action strategies described above
will be used as subroutines in a hierarchically composed strategy.
These intermediate representations might include the generation (as
part of the learning process) of support predicates and the identification
of internal states. As before, it would be possible for a domain expert
to study the action strategies learned by the system, name internal
states or add states and rules.
Unlike the learning centered approach that directly learns a mapping
from sensory input to actions (and, potentially, represents intermediate
states while doing that) the second approach assumes in its input
a more abstract representation of state information, such as that
represented in Figure 1 as component 3, and attempts to learn a joint
probability distribution over the space of internal states (of the
task and user) and actions. This probability distribution will then
be used to infer the most likely action given an observation. In this
formalization we separate the stage of recognizing task and user states
from that of action selection and assume that they were done earlier.
The focus of this approach will be on basic methods for constructing
a situation model that integrates the diverse pieces of information
we anticipate at the state level. Such information is quite dynamic
and can be fraught with uncertainty and incompleteness. The key technical
challenge in this direction lies in the coherent and efficient extension
of Bayesian networks to accommodate such diverse types of information,
with most of the work falling within the realm of probability theory,
knowledge representation and reasoning and, less than in the previous
approach, learning theory.
Bayesian networks are among the most successful approaches for managing
uncertain information and have been used in a variety of applications,
including diagnosis, planning, and course-of-action evaluation. The
success of this formalism stems mostly from its roots in probability
theory, which equips it with a well-accepted semantics, and from the
associated computational machinery which allows for practical implementations
in certain applications. The application we have at hand, however,
places technical demands that are outside the realm of standard Bayesian
networks. For example, a Bayesian network requires a complete probabilistic
model of a given situation, while the information pertaining to the
state of our student, observed and interpreted by a battery of recognition
programs, will be often incomplete.
This limitation has been long observed, with little progress achieved
so far on dealing with it in a principled matter. Although one can
define the notion of an incomplete Bayesian network, in terms of a
set of probabilistic models, a major unresolved difficulty remains
of reasoning efficiently with this class of networks. Intrinsic computational
difficulties in inference with Bayesian networks [212] which can often
be addressed using stochastic methods [213,214] seem to be too severe
in these cases. We plan to investigate here a new and promising direction
that is based on new results on compiling Bayesian networks into parameterized
arithmetic expressions [215]. Moreover, the situations we anticipate
require the ability to adapt the probabilistic model to user input,
sensory input or other forms of feedback. While adaptation is known
to be computationally hard when done directly with the Bayesian network
representation, results in learning theory indicate that it may be
easier to adapt the probabilistic representation in its arithmetic
expression form. This is another direction we intend to investigate.
Another major limitation of Bayesian networks is their static nature
as they do not include a standard representation of temporal information.
Representing time in dynamic Bayesian networks without paying a hefty
computational price remains to be an open challenge, which we plan
to address. Our approach will be based on recent results obtained
for networks with repetitive structures, a class that includes temporal
networks as a special case.
In summary, the focus of this component of our research program will
be on extending current models of learning and Bayesian networks in
several key dimensions: The former will focus on learning action strategies
in dynamic stochastic domains and the generation of intermediate state
information to allow for complex action strategies to be composed.
The latter would allow for the use of incomplete/missing information,
the adaptation of the representation given additional information
and for an efficient and canonical integration of different kinds
of information into Bayesian networks.
Intelligent dialog system and system integration: The dialog manager
is the interface between the subject and the system. It has two main functions.
First, it interprets the spoken input from the subject. Second, in conjunction
with the task-learning and user state modules, it gives appropriate spoken
information to the subject by providing encouragement, evaluating her
activities or responding to her need for help. Whatever part of that action
is in the form of a verbal response will be controlled by the dialog manager.
In particular, the dialogue system must be proactive in initiating conversation
under different state conditions, rather than simply responding to the
user's verbalizations.
The dialog manger has three parts, an interpreter, a memory and a
response generator. The interpreter is comprised of a syntactic and
semantic analyzer. It takes the structure of the subject's input and,
based on the contents of the task model stored in memory, generates
a response and updates the task model. The memory contains, in addition
to the dynamic task model and user state information, all the factual
and procedural knowledge necessary to understand the subject's input
and to take responsive or proactive actions. The response generator
uses a formal grammar to compose appropriate output sentences. These
sentences must then be marked prosodically so that the correct intonation
can be applied to convey the affective aspects of the response. This
information is then given to the text-to-speech synthesizer so that
a spoken response can be made to the subject. The technical details
of such a system are given in Levinson and Shipley [198]. These techniques
will be modified for the Lego-Logo task.
A special technique that we have found effective in the past for the
integration of such a complex system is to build the basic framework
of the entire system and add specific capabilities one at a time.
This methodology should be used at a very fine-grained level so that
the ability of the system to respond to very specific sentential forms
will be added one at a time. A goal of our development will be to
accomplish the purposes of the system with a minimum of speech understanding
requirements.
Affective synthesized speech and face displays: Some communication
between the system and the subject will be by means of an animated face
and a corresponding voice with emotional content. Both the articulation
as manifest in the face and the prosodic features of the voice must reflect
the desired emotions.
For face animation, we choose the 3D-model rather than the appearance-based
approach, because for our application complete realism is not needed
(or even desired) and because 3D models are more flexible. We have
already developed a generic 3D face/head model (which we call the
iFace) which can be fitted to any particular person's face based on
range or multiple 2D image data. The iFace will be driven by text
and a facial expression script (provided by the "computer action
decision" module). We have developed a preliminary mapping from
phonemes and facial expressions (smile, frown, etc.) to facial movements.
This mapping needs to be improved. Of particular interest is the co-articulation
of speech-related lip movement and expression-related facial movement.
To do this research, we are currently using Microsoft's TTS (Text-To-Speech)
software. In the near future, we plan to get an open version of either
Lucent Technologies' or Motorola's TTS, which will provide much more
flexibility.
Initially, the voice will be provided by a text-to-speech synthesizer
in which the prosodic features of an utterance may be altered by the
insertion of special symbols in the text. Affective speech is thus
generated by applying a set of hand-crafted rules to the text of the
desired utterance so that the appropriate symbols are placed in the
text prior to synthesis. Later, we will build our own synthesizer
so that we can get full control of the internal parameters. Combined
with our word modeling and intelligent agent research, we want to
apply semantic information when the speech is synthesized so prosodic
parameters can be selected more sensibly. Our preliminary results
show that pitch contours play import roles in expressing emotions
and they are dependent on the content that is being synthesized.
Evaluation: There are three levels of evaluation planned: 1)
does the system produce acceptable interaction dialogues with the students,
2) what effects do various interactions have on the students, and 3) does
proactive computing help the students achieve desired learning, motivational,
and affective goals? The first level will be the focus of evaluation during
the early stages of the grant, and the second and third levels will become
the focus during later stages when the system is consistent in taking
reasonably appropriate actions.
At the initial stages of the project, the gaps between the system's
performance and reasonable tutor performance will likely be large
enough that little formal evaluation will be necessary. However, as
these gaps are narrowed, more sensitive formative evaluation will
be needed. To accomplish this, sessions of children working with the
computer will be videotaped and segmented. Segments will then be presented
to other children and to teachers for their judgments of the appropriateness
of the computer's actions, and for suggestions for improvement. In
addition, interactional analyses [166] will be carried out to examine
subtleties in the types of actions being taken by the computer under
different user and task state conditions.
Once the system is capable of a reasonable degree of interaction,
we can examine the effects that the various computer actions have
on the students. This can be accomplished either through analysis
of videotapes (including going through a tape with the subject herself
to probe her reactions) or through using 'think-aloud' methods in
which children verbalize their reactions as they are engaged in the
task.
As the system matures we will be able to evaluate the extent to which
different aspects of proactive interaction facilitates the achievement
of learning, motivational, and affective goals. Since the goals are
varied, this evaluation must be multidimensional. It is expected to
include the length of time children remain at the computer, the proportion
of this time that the child is actually engaged in the task, the frequencies
of negative and positive reactions, whether children are more successful
in accomplishing tasks with proactive assistance and encouragement,
and whether principles learned in building one object transfer to
building another. The proactive computing system will be constructed
in such a way that parts of its interaction capability can be disabled,
thus allowing investigations of the effects of the presence or absence
of these capabilities.
Summary: An interdisciplinary team will attempt to construct and
evaluate the effects of a proactive computer system that responds, not
just to user requests, but to sensed mental and task states, as well.
This is an example of an attempt to create a true "human centered"
computer system by providing the computer with the means of acquiring
a great deal of information about its user and basing its actions on that
information in addition to direct requests. This is a high risk project:
no one has previously attempted to make this much real-time user information
available to the computer, and developing the ability to act based on
this information presents serious challenges. At the same time, it is
a major attempt at defining and exploring characteristics of a new human-computer
interface paradigm. This type of interface, should it be refined and available
generally, would dramatically change people's relationships with their
computers. The computer itself would carry much of the burden of the dialogue,
thereby greatly easing requirements on the human for the benefits of successful
interaction to occur. We believe that, used wisely, this approach can
facilitate new users' entry into computer use, and increase the value
of computer assistance to experienced users. It is much more than just
adjusting to users' expressed preferences; it is a process of getting
to know the user and becoming a close and helpful companion.
|