|
Adaptive Computing Lab
The Adaptive Computing Lab will be located in one of the Atheneum
rooms in S16 level 2 starting July. Students, research fellows,
research assistants and engineers who are working on adaptive
computing projects will be housed there.
Research Program
The research program aims to enable the development of computing
systems that can respond and adapt to rapid changes in complex
environments, with a focus on networked, high-performance, and
pervasive computing systems. We propose to develop new algorithms,
software, and technologies to support robust, simple, flexible, and
economical usage of diverse resources. Our research is a step toward
addressing the nearly universal need for adaptivity in computer
systems.
The need for computing systems that adapt automatically
Today's computer technology is brittle, not only for catastrophic
emergencies, but for simple changes, such as installing a new
version of the operating system, changing the network vendor, adding
a new computer, or coping with a crashed router. If we only had to
worry about emergencies, we'd be okay most of the time.
Increasingly, however, more and more human effort is devoted to
adapting their computer systems.
Manual adaptation of computer systems cannot continue. The
complexity of the systems is growing, and the complexity of the data
that systems must work with is also growing. We are rapidly evolving
toward computer systems of worldwide scope and immense complexity.
Not just worldwide communication networks of somewhat isolated
systems, but worldwide pervasive, embedded computing with literally
millions of computational components. In the past, for small,
isolated systems, it was possible for a human to understand their
circumstances well enough to engineer and configure appropriate
solutions directly. As systems become global, pervasive, and
embedded, however, manual adaptivity will cease to be a realistic
option for two reasons:
- Systems contain too many components and are used in too many
different circumstances for humans to engineer and configure them
effectively for all possible future situations.
- The components and their environments are all changing
simultaneously and in real time: individual nodes fail, bandwidth
changes, software is upgraded, applications change, hardware is
swapped out, and so on.
To address these scaling problems, we need to enable systems to
adapt automatically to their changing environments.
Currently, large computer systems typically run applications that
deal with digital and unambiguous data, such as inventory
management, accounting, and database processing. In contrast, the
data representing changes in real-world environment are often analog
and ambiguous, frequently in the form of signals, including the
outputs of a wide variety of sensors (medical, seismic, and radar,
for example), as well as voice, photos, and video. The downstream
processing required to extract the essence of these signals, in
terms of what is actually occurring in the world, can be enormous,
stretching the capabilities of today's high-performance
multiprocessor clusters. Crucially, effective adaptivity to users'
needs and goals requires a detailed, real-time understanding of what
users are doing or trying to do, which cannot be achieved without
extensive awareness of the physical world.
To deal with the scaling up of system and data complexity, systems
must exhibit on-line, real-time adaptivity. Their behavior must
respond automatically, without human intervention, to changes in the
state of the environment. This adaptivity can be achieved partly by
new algorithms that are inherently robust to operating conditions.
To achieve adaptivity in many cases, however, we must also enable
computer systems to have a richer and more complete perception of
their environment. The system must be aware of the behavior of its
own components, both software and hardware, and aware of its
external environment, whether they be other machines or human users.
Furthermore, the adaptivity cannot generally be achieved by some
preprogrammed set of rules. Adaptation must be based on active,
autonomic learning of flexible new behaviors.
Application Scenario: Adaptive computing in enterprises
Server farms in today's companies provide computation for many
business applications, including money management, stock trading,
and media technology. These farms can harbor several thousands of
CPU's running under different operating systems, with hundreds of
disk arrays, interconnected in a complicated network topology, with
transactions coming from many possible paths. No human individual
can manage such complex high-performance systems under fluctuating
demand and an unpredictable operating environment. Configuring to
assure efficiency or reconfiguring the system to overcome faults
normally takes weeks.
We envisage adaptive technology to manage resources in a server farm
autonomously. An adaptive application monitors the state of the
system, adjusting its operation as required. The overall system is
governed adaptively by machine-learned policies, allowing
distributed checkpointing and rollback for recovery. The processors,
distributed over a global scale, join or leave the system without
central control. The network topology reconfigures itself on demand,
assuring a sufficient level of bandwidth and latency while adapting
to the incoming traffic and communication patterns. Databases
support each other, offering system-wide consistency and
fault-tolerance.
The scenario calls for two important new capabilities:
- The capacity to monitor the internal state of complex computer
systems without imposing undue overhead.
- The ability to learn actions to cope with complex, time-varying
environments.
Application Scenario: Health monitoring and care for the
elderly
The longer lifespans made possible by modern medicine mean that an
increasing percentage of our society is composed of elderly people.
Enabling the elderly to lead productive, independent lives benefits
society substantially. Many of the elderly, however, will ultimately
have health issues, ranging from heart disease to Alzheimer's, that
threaten their independence. Enabling them to stay at home longer
both benefits their well-being and reduces demands for caregivers
and medical facilities.
We envisage monitoring the health and activities of elderly
people in their homes through an extensive set of pervasive sensors,
including fixed cameras, microphones, and wearable medical devices.
The data from these sensors is processed to detect anomalies that
indicate a need for human intervention. Remote family or caregivers
are notified and can rest assured that professional or emergency
help is called in a timely manner in case of an emergency.
This scenario calls for two important new capabilities:
- A robust, adaptive software and hardware infrastructure for the
integration of large numbers of unreliable sensors and computing
devices.
- Mechanisms for interpreting the sensory data and relating it to
appropriate actions, such as notifying a caregiver.
Adaptive Computing Infrastructure
We seek to develop an adaptive infrastructure to understand
the various trade-offs in the design of a computing system, to
support flexible and economical computing frameworks, and to develop
techniques for diverse applications of adaptive computing systems.
Unlike current static systems, which are designed to perform only a
single task within anticipated operating conditions, we propose to
enable the development of systems exhibiting the following traits:
- Self-healing and self-reconfiguring: The system
acknowledges that software and hardware components fail. It attempts
to continue to operate successfully in spite of these failures. It
supports the easy addition and automatic reconfiguration of software
and hardware components.
- Aware: The system monitors both its internal and external
environments to maintain an up-to-date model of the world.
- Responsive: The system uses what it knows about the world
to configure, reconfigure, and marshal resources. It offers
assurances of functionality under changing environmental conditions.
It automatically learns rules for achieving its goals.
The adaptive infrastructure development is organized as two
complementary thrusts:
- Adaptive Intelligence: A perception framework for
creating awareness of physical environments. Algorithms for machine
learning and adaptive control.
- Adaptive Software: An infrastructure architecture for
robust, self-aware, self-healing, adaptive software, with a focus on
networked, high-performance, and pervasive systems.
These two layers, working together, will allow adaptive systems
to be easily developed. The project's adaptive infrastructure
will enable a new generation of applications to be developed on
networked, high performance, and pervasive systems.
Adaptive
Intelligence
The Adaptive Intelligence thrust is divided into three tasks.
Object and Activity Recognition addresses the problems of making
a system aware of its physical environment. On-Line Learning
will develop a toolkit of learning algorithms that run in real time
as environment change. Learning Complex Models will extend
simple models for representation and learning to the kind of complex
domains that arise in adaptive computing.
- Object and Activity Recognition: Any system that works
with people and for people must be able to understand what people
are doing. Some systems have attempted to understand a human's
physical activity simply by tracking their body pose over time. This
method gives high-level clues as to what the user is doing, but
without knowing what the human is interacting with, it cannot be
sufficiently informative on its own. Future systems must embody much
more sophisticated recognition of what people are doing and how they
are interacting with their environments. These systems must
understand their emotional and mental state, their physical
location, and what objects they are working with.
Adaptive systems must sense rich perceptual information about
people, including who is in a scene, but also including information
about their physical and mental state. This information can be made
available through tracking a person's gaze, analyzing their gait,
and studying their facial expressions. We plan to start from
existing methods and develop a library of fundamental techniques for
perception that can be broadly deployed.
Any adaptive system that can be hand-carried needs to recognize
places. Gross localization in space can be achieved outdoors via the
Global Positioning System (GPS). More fine-grained localization can
be achieved indoors via systems of beacons. Our systems should be
deployable without modifying the environment, however. Furthermore,
there is more to knowing where you are than having accurate
estimates of your latitude and longitude. A system should be able to
``see'' what room it's in, where people are located with respect to
it, where the important facilities in the room are, etc. A system
with these abilities could help a user navigate in relatively
unfamiliar buildings or support cognitively challenged people in
everyday activity.
For a system to really understand a user's activity, it must be able
to recognize the objects her or she is interacting with. It might be
important, for example, to understand what an Alzheimer's patient is
eating or drinking, what objects they are putting into the
refrigerator, whether their bedroom is in disarray, or whether they
are turning the stove off or on.
These perceptual problems can all be handled by a common underlying
structure:
- Perform signal-processing to extract basic information about
scenes or scene sequences.
- Construct probabilistic models of ``typical'' distributions
from training data.
- Take a current set of sensory data and use the learned models
to ``parse'' the objects or activities in order to recognize them
as instances of a previously known category or to signal them as
unusual and therefore potentially significant.
- Use prior information (derived from learned expectations about
the spatial and temporal relationships between objects, people,
and activities) to bias the recognition process.
This framework applies, whether we are recognizing emotional
states, gait, the layout of furniture in a room, the kinds of dishes
in the sink, or the flow of a meeting. It applies to perception
based on a wide variety and combination of sensors, including
vision, acoustic, locational, etc.
An enormous amount of research is necessary to develop reliable
technology for perceptual understanding of people, places, and
things. The Object and Activity Recognition task will concentrate on
people and things, building a library of tools for recognizing basic
properties first, and gradually building to perception of more
complex attributes of the environment.
- On-Line Learning: In recent years, machine-learning
algorithms have become mature, to the point that many can be used
off-the-shelf by nonexperts. These algorithms are predominantly
aimed at basic supervised learning, in which the system is presented
with a set of input-output pairs, and is expected to find a function
that does a good job of describing the input-output relationship,
and that can be used to predict the outputs for previously unseen
inputs. A variety of effective methods for supervised learning
exist, including neural networks, decision trees, and support vector
machines. These basic algorithms provide a good starting point, but
they are not perfectly suited to on-line learning in complex
environments, because they are typically run off-line, in batch
mode, and with no ability to adapt quickly to a changing
environment.
The On-Line Learning task will build a toolkit of on-line learning
methods that can be used in all parts of the adaptive
infrastructure. We will begin with basic supervised methods, but
then focus our research on extending these methods to on-line,
real-time learning algorithms that run quickly while tracking a
changing environment. Effective algorithms with good theoretical
guarantees on performance exist for this problem in simple
situations. We propose to extend these algorithms to work in the
more complex environments found in adaptive-computing applications.
The On-Line Learning toolkit will also include algorithms for
reinforcement learning (RL), which allows control systems to adapt
based on an external, indirect measure of performance, rather than
on a supplied set of training examples. The RL framework is well
suited to many problems in adaptive systems including
load-balancing, caching, power-management, user-preference modeling,
etc. Current RL algorithms have had notable success when applied in
simulated domains that allow many thousands or even millions of
trials, but on-line applications must learn quickly from a much
smaller number of examples. This task will concentrate on developing
small, efficient, reliable RL modules that can be distributed
throughout a complex system, to tune aspects of the hardware, system
software, and application behavior.
- Learning Complex Models: Most work in supervised and
reinforcement learning can only handle situations that can be
modeled as fixed-length vectors of attributes. Moreover, they can
only learn from a single source of information. The Learning Complex
Models task aims to learn much more complex models and behaviors
from a much more diverse range of training information.
Consider the problem of modeling a complex activity, such as a
business meeting, or the performance of a chemistry experiment. Such
an activity has arbitrary duration and a rich, systematic, but
flexible structure. Another example of a complex distribution is
that of the shapes of classes of man-made objects, such as chairs.
Chairs come in many varieties with different structures, shapes, and
materials. A grammatical model in three-dimensional space may allow
the variability of structure to be accounted for. In particular,
probabilistic grammatical models, such as stochastic logic programs
or hierarchical hidden Markov models seem to be appropriate for
modeling such activities. These models currently have only basic
learning algorithms. Moreover, they have been typically used to
model language with words or phonemes as the atomic grammatical
elements. In our domains, the primitives might be highly complex
combinations of sensory data (images, audio, etc.). A major focus of
the Learning Complex Models task will be to extend these models to
represent and learn the complex probability distributions inherent
in adaptive systems.
Another problem that arises in complex systems is learning from
combinations of data from different sources. We might need to learn
to categorize the well-being of an elderly person from a combination
of information, including audio, video, results of tracking motion
through their apartment, and so on. In any particular training or
testing example, only some of these information sources may be
present. Learning in such circumstances is challenging, requiring
such techniques as bootstrapping and cotraining. We propose to
extend and apply these methods to learn from combinations of visual
and text data, as well as focusing on their on-line, real-time
applicability.
Although basic RL methods are only designed with a single learning
agent in mind, many complex systems are appropriately thought of as
being made up of multiple learning agents. To help elderly people,
intelligent physical devices may react to the changes in the
environment by reconfiguring their software and hardware components.
These independently operating modules may cooperate with one another
to achieve a common goal, such as locomotion. The system must
determine the geometric and physical structures suitable for a set
of tasks and have the ability to transform from one structure into
another. One area where relevant work has been done is in modular
self-reconfigurable robots, which consist of hundreds or even
thousands of identical modules. Eventually, a ``smart house'' may
have thousands of computational, sensory and effector elements.
Coordinating them to achieve tasks is an enormous problem, given the
range of possible tasks and elements. We will focus on two aspects
of the problem: (i) identifying a set of elements important for
common daily tasks; (ii) exploring representations and algorithmic
approaches that support efficient structural transformation.
Adaptive
Software
The Adaptive Software thrust is broken into four tasks. Survival
addresses the problem of making a system continue to run in the face
of unexpected events or operating conditions without human
intervention. Monitoring aims to understand how a system can
be made to observe properties of its external and internal operating
conditions. Reconfiguration will study methods for allowing a
system to grow and adapt. Adaptation Policies will develop
algorithms for updating a system so that it productively adapts to
observed changes. Overall, the Adaptive Software thrust will develop
a software architecture based on leveraging the algorithms and
methods developed by the Adaptive Intelligence thrust.
The problems addressed by these four tasks have important
implications for the vast majority of all software systems, but they
are especially pertinent to the successful operation of the
pervasive, high-performance, and widely distributed systems that are
the focus of this proposal. Because these systems interact with the
physical world, they must continually deal with new (and often
unexpected) inputs and operating conditions. Existing systems are
notoriously brittle in this situation---inputs or operating
conditions that diverge from envisioned and tested scenarios can
easily trigger unexpected, causing the system to fail, sometimes in
dangerous ways.
The ideal result of this thrust will be to develop an architecture
to support systems that can survive unexpected events and operating
conditions. They will have the ability to dynamically discover and
manage available resources, reconfigure themselves to adapt to
varying requirements, and self-heal into desired configurations when
nodes join, leave, or fail. With current computer systems, these
capabilities require intensive human intervention and configuration.
By obviating human intervention, we can (1) dramatically decrease
the cost (and hence increase the availability) of complex computer
systems, and (2) enable a new level of responsiveness and adaptation
that was previously unavailable at any price.
- Survival: Current systems either stop or fail whenever
they detect an internal error or unexpected input or environment.
This behavior is clearly unacceptable when there is little or no
prospect that a human administrator will come along to fix the
system. The Survival task will develop a set of techniques that
enable the system continue to execute successfully through errors.
Failure-oblivious computing offers the kind of survival mechanisms
we propose to develop. The idea is for software to do whatever it
can to keep the system alive long enough to bring the higher-level
reconfiguration mechanisms to bear. Failure-oblivious computing help
prevent catastrophic cascading errors, in which the failure of a
single component propagates to cause the entire system to fail. This
kind of error is characteristic of complex distributed systems, such
as networked, high-performance, and pervasive computing systems.
Using techniques such as failure-oblivious computing, the Survival
task hopes to dramatically reduce the need for human intervention.
- Monitoring: For any complex system to adapt, it must be
aware of the operating conditions of its components. Obtaining and
integrating this information is a daunting task for any system, and
all the more daunting for networked, high-performance, and pervasive
systems. Bandwidth and availability problems can complicate the
acquisition of information, and components do not implement any
standard interface to facilitate the uniform transmission of system
health. The Monitoring task will investigate software systems
techniques for facilitating the acquisition of information of system
health. It will develop standard monitoring interfaces and build
experimental systems with components that implement these
interfaces. It will also develop algorithms for transmitting and
combining the monitoring information.
- Reconfiguration: Applications can reconfigure themselves
in a variety of ways. They can move or redirect computations. They
can colonize new resources. They can restart activities previously
running on failed resources. They can scale back system activities
in the face of failures. The required functionality is generic
across most envisioned applications. Current solutions, however,
usually rely on centralized mechanisms that are vulnerable to
performance bottlenecks and failures. The Reconfiguration task will
investigate new, decentralized approaches that scale with the
system. The goal is to provide reliable services that are
invulnerable to partial failures involving any specific part of the
system. As part of this task, we will deliver a dynamic distributed
information structure that provides basic services such as
discovery, name-based communication, and publish/subscribe
functionality. All of these services will be implemented in a robust
way across the distributed system. We will then build on this
experience to develop a range of basic services that provide the
full range of application support for reconfigurable systems.
Examples of such services include storage management, data
distribution, searching, and group services.
- Adaptation Policies: Reconfiguration support is of little
use to applications that cannot exploit the reconfiguration
abilities productively. The Adaptation Policies task will develop
several core applications that will (1) enable us to evaluate our
system, and (2) provide models that others can use as they develop
additional applications. The Adaptation Policies task will initially
focus on two specific applications: distributed databases and
adaptive scientific computations.
Our distributed database research focuses on how to execute queries
against a database distributed across a large computational
infrastructure. The problems that such as system must confront
include partial failures, bandwidth fluctuations, and performance
anomalies. We have already developed techniques to reorder the
operations of a distributed query processing strategy at runtime.
This reordering enables the database to adapt to some workload and
bandwidth fluctuations in the underlying computational
infrastructure. We propose to extend this research to include a
broader range of performance fluctuations and to incorporate
self-healing techniques that will enable the system to successfully
recover from failures.
Our adaptive scientific computation research will build on our
previous FFTW research (winner of the prestigious J. H. Wilkinson
Prize for Numerical Software in 1999). The current version of FFTW
implements an adaptive version of the fast Fourier transformation (FFT).
When FFTW starts up, it spends a couple of seconds running
experiments to measure various ways of running the FFT on the
particular machine, and then it determines a plan for implementing
the FFT for that architecture, cache size, and operating
environment. Across a broad range of computing environments, FFTW
outperforms laboriously hand-coded FFT's tailored specifically for
each environment. FFTW's adaptation occurs only at start-up,
however. It does not adapt once it has chosen an execution strategy,
making its performance brittle to changes in its environment. We
propose to explore ways to enable applications like FFTW to adapt
continuously during the computation to take advantage of new
computational resources and appropriately reapportion load when
existing resources become unavailable.
Application Projects
Health Monitoring and Care for the
Elderly
As the world population matures, technologies are
increasingly more important to help reduce the cost and improve the
quality of elderly care. Such ``living-assisting'' technologies help
keep the elderly safe and looking after their own health conditions
when caregivers are away, thereby reducing and delaying the need for
hospitalization or other costly care arrangements. Efforts in
computer-aided health monitoring for elderly care started decades
ago. Early solutions have not worked well, mainly because of the
high costs of the monitoring devices, communication modes, and the
inflexible ways in which the caregivers or clinicians have had to
adapt to the complex technologies used to monitor and analyze the
relevant data.
Costs for the monitoring devices have decreased sharply over the
years, and new generations of devices that are small, robust, and
multifunctional are now available. For instance, the latest issue of
Technology Review in April 2004 reported the invention of a
ring-size wearable monitor developed by MIT researchers that can
monitor the wearer's temperature, heart rate, and blood oxygen
level. Similarly, new modes of communication via the Internet and
wireless networks are increasingly available. These technology
advancements have bring effective health monitoring for elderly care
closer to reality.
Many scientific and implementation challenges still remain, however,
especially in integrating multimodal, multisource, noisy data and
analyzing it effectively to produce timely, relevant decisions and
actions. Recent advances in analytic technologies have seen
promising results, but they are usually restricted to limited
domains with a restricted set of data. This showcase project will
highlight the feasibility of developing a cost-effective,
adaptive-computing framework to support a significant real
application.
The proposed Showcase application prototype would, for instance,
allow signals to be collected from an elderly patient through simple
and robust wearable devices or environmental sensors. The signals
could be processed and analyzed in a stand-alone home computer when
the patient is at home, or they could be sent over a wireless
network to a grid of public-health computer clusters located in
various area hospitals or institutions when he is walking along the
street. Such a public-health grid may in turn be implemented in an
adaptive manner to support hardware, software, and network
self-configuration for different task types and load patterns.
The Showcase thrust will proceed in phases. In the initial phase,
we will develop a networked wearable device system for the elderly.
The system will adaptively collect and analyze physiological data,
suggest health status, communicate with doctors, and assist the
wearer to perform certain biofeedback exercises to improve health
condition. We expect that the initial system will provide an ideal
platform to systematically study health problems for the elderly, as
well as normal people in other age groups, by building a
physiological database, which has never been done before.
The initial prototype system will consist of three main modules:
physiosignal acquisition, analysis and suggestion, and biofeedback.
The wearable device is installed with microelectromechanical system
(MEMS) sensors, such as accelerometers, skin conductivity sensors,
ECG sensors, and blood-volume pulse sensors. With these sensors, the
device can acquire signals of the wearer related to the activities,
gaits, and early symptoms of heart disease. The adaptive analysis
model is tuned to the wearer. It performs real-time adaptive
analysis of the acquired signals and suggests the physiological
status of the wearer. Biofeedback is a scientifically validated
method for treating a variety of health problems such as
cardiovascular and respiratory systems.
The adaptive signal analysis and feedback modules in the wearable
device is based on physiological models and adaptive schemes.
Signals acquired by the wearable device are transmitted to a central
database, where data analysis and mining are carried out over data
collected from several persons wearing the device. With
physiologists' expert knowledge, the data-mining results are
validated and captured as physiological models. Based on the
physiological models and system optimization methods, an adaptation
scheme is created and updated. The adaptation schemes are then
injected into those wearable devices to support adaptive signal
analysis and feedback modules. The wearable device also communicates
to other wearable devices and to the doctor.
Equipment
The lab has purchased a cluster of 54 node dual-processor
Opteron 2.4MHz machines. The cluster will be set up by May/June. The
lab has around 20 HP IPAQ PDAs installed with Linux and various
sensors for pervasive computing research.
|