Research Program
Adaptive Computing Infrastructure
Application Projects
Equipment

 

   

Adaptive Computing Lab

The Adaptive Computing Lab will be located in one of the Atheneum rooms in S16 level 2 starting July. Students, research fellows, research assistants and engineers who are working on adaptive computing projects will be housed there.

Research Program
The research program aims to enable the development of computing systems that can respond and adapt to rapid changes in complex environments, with a focus on networked, high-performance, and pervasive computing systems. We propose to develop new algorithms, software, and technologies to support robust, simple, flexible, and economical usage of diverse resources. Our research is a step toward addressing the nearly universal need for adaptivity in computer systems.

The need for computing systems that adapt automatically
Today's computer technology is brittle, not only for catastrophic emergencies, but for simple changes, such as installing a new version of the operating system, changing the network vendor, adding a new computer, or coping with a crashed router. If we only had to worry about emergencies, we'd be okay most of the time. Increasingly, however, more and more human effort is devoted to adapting their computer systems.

Manual adaptation of computer systems cannot continue. The complexity of the systems is growing, and the complexity of the data that systems must work with is also growing. We are rapidly evolving toward computer systems of worldwide scope and immense complexity. Not just worldwide communication networks of somewhat isolated systems, but worldwide pervasive, embedded computing with literally millions of computational components. In the past, for small, isolated systems, it was possible for a human to understand their circumstances well enough to engineer and configure appropriate solutions directly. As systems become global, pervasive, and embedded, however, manual adaptivity will cease to be a realistic option for two reasons:

  • Systems contain too many components and are used in too many different circumstances for humans to engineer and configure them effectively for all possible future situations.
  • The components and their environments are all changing simultaneously and in real time: individual nodes fail, bandwidth changes, software is upgraded, applications change, hardware is swapped out, and so on.

To address these scaling problems, we need to enable systems to adapt automatically to their changing environments.

Currently, large computer systems typically run applications that deal with digital and unambiguous data, such as inventory management, accounting, and database processing. In contrast, the data representing changes in real-world environment are often analog and ambiguous, frequently in the form of signals, including the outputs of a wide variety of sensors (medical, seismic, and radar, for example), as well as voice, photos, and video. The downstream processing required to extract the essence of these signals, in terms of what is actually occurring in the world, can be enormous, stretching the capabilities of today's high-performance multiprocessor clusters. Crucially, effective adaptivity to users' needs and goals requires a detailed, real-time understanding of what users are doing or trying to do, which cannot be achieved without extensive awareness of the physical world.

To deal with the scaling up of system and data complexity, systems must exhibit on-line, real-time adaptivity. Their behavior must respond automatically, without human intervention, to changes in the state of the environment. This adaptivity can be achieved partly by new algorithms that are inherently robust to operating conditions. To achieve adaptivity in many cases, however, we must also enable computer systems to have a richer and more complete perception of their environment. The system must be aware of the behavior of its own components, both software and hardware, and aware of its external environment, whether they be other machines or human users. Furthermore, the adaptivity cannot generally be achieved by some preprogrammed set of rules. Adaptation must be based on active, autonomic learning of flexible new behaviors.

Application Scenario: Adaptive computing in enterprises
Server farms in today's companies provide computation for many business applications, including money management, stock trading, and media technology. These farms can harbor several thousands of CPU's running under different operating systems, with hundreds of disk arrays, interconnected in a complicated network topology, with transactions coming from many possible paths. No human individual can manage such complex high-performance systems under fluctuating demand and an unpredictable operating environment. Configuring to assure efficiency or reconfiguring the system to overcome faults normally takes weeks.

We envisage adaptive technology to manage resources in a server farm autonomously. An adaptive application monitors the state of the system, adjusting its operation as required. The overall system is governed adaptively by machine-learned policies, allowing distributed checkpointing and rollback for recovery. The processors, distributed over a global scale, join or leave the system without central control. The network topology reconfigures itself on demand, assuring a sufficient level of bandwidth and latency while adapting to the incoming traffic and communication patterns. Databases support each other, offering system-wide consistency and fault-tolerance.

The scenario calls for two important new capabilities:

  • The capacity to monitor the internal state of complex computer systems without imposing undue overhead.
  • The ability to learn actions to cope with complex, time-varying environments.

Application Scenario: Health monitoring and care for the elderly
The longer lifespans made possible by modern medicine mean that an increasing percentage of our society is composed of elderly people. Enabling the elderly to lead productive, independent lives benefits society substantially. Many of the elderly, however, will ultimately have health issues, ranging from heart disease to Alzheimer's, that threaten their independence. Enabling them to stay at home longer both benefits their well-being and reduces demands for caregivers and medical facilities.

We envisage monitoring the health and activities of elderly people in their homes through an extensive set of pervasive sensors, including fixed cameras, microphones, and wearable medical devices. The data from these sensors is processed to detect anomalies that indicate a need for human intervention. Remote family or caregivers are notified and can rest assured that professional or emergency help is called in a timely manner in case of an emergency.

This scenario calls for two important new capabilities:

  • A robust, adaptive software and hardware infrastructure for the integration of large numbers of unreliable sensors and computing devices.
  • Mechanisms for interpreting the sensory data and relating it to appropriate actions, such as notifying a caregiver.

Adaptive Computing Infrastructure
We seek to develop an adaptive infrastructure to understand the various trade-offs in the design of a computing system, to support flexible and economical computing frameworks, and to develop techniques for diverse applications of adaptive computing systems. Unlike current static systems, which are designed to perform only a single task within anticipated operating conditions, we propose to enable the development of systems exhibiting the following traits:

  • Self-healing and self-reconfiguring: The system acknowledges that software and hardware components fail. It attempts to continue to operate successfully in spite of these failures. It supports the easy addition and automatic reconfiguration of software and hardware components.
  • Aware: The system monitors both its internal and external environments to maintain an up-to-date model of the world.
  • Responsive: The system uses what it knows about the world to configure, reconfigure, and marshal resources. It offers assurances of functionality under changing environmental conditions. It automatically learns rules for achieving its goals.

The adaptive infrastructure development is organized as two complementary thrusts:

  • Adaptive Intelligence: A perception framework for creating awareness of physical environments. Algorithms for machine learning and adaptive control.
  • Adaptive Software: An infrastructure architecture for robust, self-aware, self-healing, adaptive software, with a focus on networked, high-performance, and pervasive systems.

These two layers, working together, will allow adaptive systems to be easily developed. The  project's adaptive infrastructure will enable a new generation of applications to be developed on networked, high performance, and pervasive systems.

Adaptive Intelligence
The Adaptive Intelligence thrust is divided into three tasks. Object and Activity Recognition addresses the problems of making a system aware of its physical environment. On-Line Learning will develop a toolkit of learning algorithms that run in real time as environment change. Learning Complex Models will extend simple models for representation and learning to the kind of complex domains that arise in adaptive computing.

  • Object and Activity Recognition: Any system that works with people and for people must be able to understand what people are doing. Some systems have attempted to understand a human's physical activity simply by tracking their body pose over time. This method gives high-level clues as to what the user is doing, but without knowing what the human is interacting with, it cannot be sufficiently informative on its own. Future systems must embody much more sophisticated recognition of what people are doing and how they are interacting with their environments. These systems must understand their emotional and mental state, their physical location, and what objects they are working with.

    Adaptive systems must sense rich perceptual information about people, including who is in a scene, but also including information about their physical and mental state. This information can be made available through tracking a person's gaze, analyzing their gait, and studying their facial expressions. We plan to start from existing methods and develop a library of fundamental techniques for perception that can be broadly deployed.

    Any adaptive system that can be hand-carried needs to recognize places. Gross localization in space can be achieved outdoors via the Global Positioning System (GPS). More fine-grained localization can be achieved indoors via systems of beacons. Our systems should be deployable without modifying the environment, however. Furthermore, there is more to knowing where you are than having accurate estimates of your latitude and longitude. A system should be able to ``see'' what room it's in, where people are located with respect to it, where the important facilities in the room are, etc. A system with these abilities could help a user navigate in relatively unfamiliar buildings or support cognitively challenged people in everyday activity.

    For a system to really understand a user's activity, it must be able to recognize the objects her or she is interacting with. It might be important, for example, to understand what an Alzheimer's patient is eating or drinking, what objects they are putting into the refrigerator, whether their bedroom is in disarray, or whether they are turning the stove off or on.

    These perceptual problems can all be handled by a common underlying structure:
     
    • Perform signal-processing to extract basic information about scenes or scene sequences.
    • Construct probabilistic models of ``typical'' distributions from training data.
    • Take a current set of sensory data and use the learned models to ``parse'' the objects or activities in order to recognize them as instances of a previously known category or to signal them as unusual and therefore potentially significant.
    • Use prior information (derived from learned expectations about the spatial and temporal relationships between objects, people, and activities) to bias the recognition process.

    This framework applies, whether we are recognizing emotional states, gait, the layout of furniture in a room, the kinds of dishes in the sink, or the flow of a meeting. It applies to perception based on a wide variety and combination of sensors, including vision, acoustic, locational, etc.

    An enormous amount of research is necessary to develop reliable technology for perceptual understanding of people, places, and things. The Object and Activity Recognition task will concentrate on people and things, building a library of tools for recognizing basic properties first, and gradually building to perception of more complex attributes of the environment.
     

  • On-Line Learning: In recent years, machine-learning algorithms have become mature, to the point that many can be used off-the-shelf by nonexperts. These algorithms are predominantly aimed at basic supervised learning, in which the system is presented with a set of input-output pairs, and is expected to find a function that does a good job of describing the input-output relationship, and that can be used to predict the outputs for previously unseen inputs. A variety of effective methods for supervised learning exist, including neural networks, decision trees, and support vector machines. These basic algorithms provide a good starting point, but they are not perfectly suited to on-line learning in complex environments, because they are typically run off-line, in batch mode, and with no ability to adapt quickly to a changing environment.

    The On-Line Learning task will build a toolkit of on-line learning methods that can be used in all parts of the adaptive infrastructure. We will begin with basic supervised methods, but then focus our research on extending these methods to on-line, real-time learning algorithms that run quickly while tracking a changing environment. Effective algorithms with good theoretical guarantees on performance exist for this problem in simple situations. We propose to extend these algorithms to work in the more complex environments found in adaptive-computing applications.

    The On-Line Learning toolkit will also include algorithms for reinforcement learning (RL), which allows control systems to adapt based on an external, indirect measure of performance, rather than on a supplied set of training examples. The RL framework is well suited to many problems in adaptive systems including load-balancing, caching, power-management, user-preference modeling, etc. Current RL algorithms have had notable success when applied in simulated domains that allow many thousands or even millions of trials, but on-line applications must learn quickly from a much smaller number of examples. This task will concentrate on developing small, efficient, reliable RL modules that can be distributed throughout a complex system, to tune aspects of the hardware, system software, and application behavior.
     
  • Learning Complex Models: Most work in supervised and reinforcement learning can only handle situations that can be modeled as fixed-length vectors of attributes. Moreover, they can only learn from a single source of information. The Learning Complex Models task aims to learn much more complex models and behaviors from a much more diverse range of training information.

    Consider the problem of modeling a complex activity, such as a business meeting, or the performance of a chemistry experiment. Such an activity has arbitrary duration and a rich, systematic, but flexible structure. Another example of a complex distribution is that of the shapes of classes of man-made objects, such as chairs. Chairs come in many varieties with different structures, shapes, and materials. A grammatical model in three-dimensional space may allow the variability of structure to be accounted for. In particular, probabilistic grammatical models, such as stochastic logic programs or hierarchical hidden Markov models seem to be appropriate for modeling such activities. These models currently have only basic learning algorithms. Moreover, they have been typically used to model language with words or phonemes as the atomic grammatical elements. In our domains, the primitives might be highly complex combinations of sensory data (images, audio, etc.). A major focus of the Learning Complex Models task will be to extend these models to represent and learn the complex probability distributions inherent in adaptive systems.

    Another problem that arises in complex systems is learning from combinations of data from different sources. We might need to learn to categorize the well-being of an elderly person from a combination of information, including audio, video, results of tracking motion through their apartment, and so on. In any particular training or testing example, only some of these information sources may be present. Learning in such circumstances is challenging, requiring such techniques as bootstrapping and cotraining. We propose to extend and apply these methods to learn from combinations of visual and text data, as well as focusing on their on-line, real-time applicability.

    Although basic RL methods are only designed with a single learning agent in mind, many complex systems are appropriately thought of as being made up of multiple learning agents. To help elderly people, intelligent physical devices may react to the changes in the environment by reconfiguring their software and hardware components. These independently operating modules may cooperate with one another to achieve a common goal, such as locomotion. The system must determine the geometric and physical structures suitable for a set of tasks and have the ability to transform from one structure into another. One area where relevant work has been done is in modular self-reconfigurable robots, which consist of hundreds or even thousands of identical modules. Eventually, a ``smart house'' may have thousands of computational, sensory and effector elements. Coordinating them to achieve tasks is an enormous problem, given the range of possible tasks and elements. We will focus on two aspects of the problem: (i) identifying a set of elements important for common daily tasks; (ii) exploring representations and algorithmic approaches that support efficient structural transformation.

Adaptive Software
The Adaptive Software thrust is broken into four tasks. Survival addresses the problem of making a system continue to run in the face of unexpected events or operating conditions without human intervention. Monitoring aims to understand how a system can be made to observe properties of its external and internal operating conditions. Reconfiguration will study methods for allowing a system to grow and adapt. Adaptation Policies will develop algorithms for updating a system so that it productively adapts to observed changes. Overall, the Adaptive Software thrust will develop a software architecture based on leveraging the algorithms and methods developed by the Adaptive Intelligence thrust.

The problems addressed by these four tasks have important implications for the vast majority of all software systems, but they are especially pertinent to the successful operation of the pervasive, high-performance, and widely distributed systems that are the focus of this proposal. Because these systems interact with the physical world, they must continually deal with new (and often unexpected) inputs and operating conditions. Existing systems are notoriously brittle in this situation---inputs or operating conditions that diverge from envisioned and tested scenarios can easily trigger unexpected, causing the system to fail, sometimes in dangerous ways.

The ideal result of this thrust will be to develop an architecture to support systems that can survive unexpected events and operating conditions. They will have the ability to dynamically discover and manage available resources, reconfigure themselves to adapt to varying requirements, and self-heal into desired configurations when nodes join, leave, or fail. With current computer systems, these capabilities require intensive human intervention and configuration. By obviating human intervention, we can (1) dramatically decrease the cost (and hence increase the availability) of complex computer systems, and (2) enable a new level of responsiveness and adaptation that was previously unavailable at any price.

  • Survival: Current systems either stop or fail whenever they detect an internal error or unexpected input or environment. This behavior is clearly unacceptable when there is little or no prospect that a human administrator will come along to fix the system. The Survival task will develop a set of techniques that enable the system continue to execute successfully through errors.

    Failure-oblivious computing offers the kind of survival mechanisms we propose to develop. The idea is for software to do whatever it can to keep the system alive long enough to bring the higher-level reconfiguration mechanisms to bear. Failure-oblivious computing help prevent catastrophic cascading errors, in which the failure of a single component propagates to cause the entire system to fail. This kind of error is characteristic of complex distributed systems, such as networked, high-performance, and pervasive computing systems. Using techniques such as failure-oblivious computing, the Survival task hopes to dramatically reduce the need for human intervention.
     
  • Monitoring: For any complex system to adapt, it must be aware of the operating conditions of its components. Obtaining and integrating this information is a daunting task for any system, and all the more daunting for networked, high-performance, and pervasive systems. Bandwidth and availability problems can complicate the acquisition of information, and components do not implement any standard interface to facilitate the uniform transmission of system health. The Monitoring task will investigate software systems techniques for facilitating the acquisition of information of system health. It will develop standard monitoring interfaces and build experimental systems with components that implement these interfaces. It will also develop algorithms for transmitting and combining the monitoring information.
     
  • Reconfiguration: Applications can reconfigure themselves in a variety of ways. They can move or redirect computations. They can colonize new resources. They can restart activities previously running on failed resources. They can scale back system activities in the face of failures. The required functionality is generic across most envisioned applications. Current solutions, however, usually rely on centralized mechanisms that are vulnerable to performance bottlenecks and failures. The Reconfiguration task will investigate new, decentralized approaches that scale with the system. The goal is to provide reliable services that are invulnerable to partial failures involving any specific part of the system. As part of this task, we will deliver a dynamic distributed information structure that provides basic services such as discovery, name-based communication, and publish/subscribe functionality. All of these services will be implemented in a robust way across the distributed system. We will then build on this experience to develop a range of basic services that provide the full range of application support for reconfigurable systems. Examples of such services include storage management, data distribution, searching, and group services.
     
  • Adaptation Policies: Reconfiguration support is of little use to applications that cannot exploit the reconfiguration abilities productively. The Adaptation Policies task will develop several core applications that will (1) enable us to evaluate our system, and (2) provide models that others can use as they develop additional applications. The Adaptation Policies task will initially focus on two specific applications: distributed databases and adaptive scientific computations.

    Our distributed database research focuses on how to execute queries against a database distributed across a large computational infrastructure. The problems that such as system must confront include partial failures, bandwidth fluctuations, and performance anomalies. We have already developed techniques to reorder the operations of a distributed query processing strategy at runtime. This reordering enables the database to adapt to some workload and bandwidth fluctuations in the underlying computational infrastructure. We propose to extend this research to include a broader range of performance fluctuations and to incorporate self-healing techniques that will enable the system to successfully recover from failures.

    Our adaptive scientific computation research will build on our previous FFTW research (winner of the prestigious J. H. Wilkinson Prize for Numerical Software in 1999). The current version of FFTW implements an adaptive version of the fast Fourier transformation (FFT). When FFTW starts up, it spends a couple of seconds running experiments to measure various ways of running the FFT on the particular machine, and then it determines a plan for implementing the FFT for that architecture, cache size, and operating environment. Across a broad range of computing environments, FFTW outperforms laboriously hand-coded FFT's tailored specifically for each environment. FFTW's adaptation occurs only at start-up, however. It does not adapt once it has chosen an execution strategy, making its performance brittle to changes in its environment. We propose to explore ways to enable applications like FFTW to adapt continuously during the computation to take advantage of new computational resources and appropriately reapportion load when existing resources become unavailable.

Application Projects

Health Monitoring and Care for the Elderly
As the world population matures, technologies are increasingly more important to help reduce the cost and improve the quality of elderly care. Such ``living-assisting'' technologies help keep the elderly safe and looking after their own health conditions when caregivers are away, thereby reducing and delaying the need for hospitalization or other costly care arrangements. Efforts in computer-aided health monitoring for elderly care started decades ago. Early solutions have not worked well, mainly because of the high costs of the monitoring devices, communication modes, and the inflexible ways in which the caregivers or clinicians have had to adapt to the complex technologies used to monitor and analyze the relevant data.

Costs for the monitoring devices have decreased sharply over the years, and new generations of devices that are small, robust, and multifunctional are now available. For instance, the latest issue of Technology Review in April 2004 reported the invention of a ring-size wearable monitor developed by MIT researchers that can monitor the wearer's temperature, heart rate, and blood oxygen level. Similarly, new modes of communication via the Internet and wireless networks are increasingly available. These technology advancements have bring effective health monitoring for elderly care closer to reality.

Many scientific and implementation challenges still remain, however, especially in integrating multimodal, multisource, noisy data and analyzing it effectively to produce timely, relevant decisions and actions. Recent advances in analytic technologies have seen promising results, but they are usually restricted to limited domains with a restricted set of data. This showcase project will highlight the feasibility of developing a cost-effective, adaptive-computing framework to support a significant real application.

The proposed Showcase application prototype would, for instance, allow signals to be collected from an elderly patient through simple and robust wearable devices or environmental sensors. The signals could be processed and analyzed in a stand-alone home computer when the patient is at home, or they could be sent over a wireless network to a grid of public-health computer clusters located in various area hospitals or institutions when he is walking along the street. Such a public-health grid may in turn be implemented in an adaptive manner to support hardware, software, and network self-configuration for different task types and load patterns.

The Showcase thrust will proceed in phases. In the initial phase, we will develop a networked wearable device system for the elderly. The system will adaptively collect and analyze physiological data, suggest health status, communicate with doctors, and assist the wearer to perform certain biofeedback exercises to improve health condition. We expect that the initial system will provide an ideal platform to systematically study health problems for the elderly, as well as normal people in other age groups, by building a physiological database, which has never been done before.

The initial prototype system will consist of three main modules: physiosignal acquisition, analysis and suggestion, and biofeedback. The wearable device is installed with microelectromechanical system (MEMS) sensors, such as accelerometers, skin conductivity sensors, ECG sensors, and blood-volume pulse sensors. With these sensors, the device can acquire signals of the wearer related to the activities, gaits, and early symptoms of heart disease. The adaptive analysis model is tuned to the wearer. It performs real-time adaptive analysis of the acquired signals and suggests the physiological status of the wearer. Biofeedback is a scientifically validated method for treating a variety of health problems such as cardiovascular and respiratory systems.

The adaptive signal analysis and feedback modules in the wearable device is based on physiological models and adaptive schemes. Signals acquired by the wearable device are transmitted to a central database, where data analysis and mining are carried out over data collected from several persons wearing the device. With physiologists' expert knowledge, the data-mining results are validated and captured as physiological models. Based on the physiological models and system optimization methods, an adaptation scheme is created and updated. The adaptation schemes are then injected into those wearable devices to support adaptive signal analysis and feedback modules. The wearable device also communicates to other wearable devices and to the doctor.

Equipment
The lab has purchased a cluster of 54 node dual-processor Opteron 2.4MHz machines. The cluster will be set up by May/June. The lab has around 20 HP IPAQ PDAs installed with Linux and various sensors for pervasive computing research.

Page Maintained by: Catharine Tan
Last Modified on: Wednesday, December 31, 2003

Home | Introduction | Curriculum | Degree | Admissions | People | Events | Contact | Useful Links | Sitemap
© Singapore - MIT Alliance Computer Science 2003