Embedded Systems Weekly Seminar |
Semester 1, 2004-05
This seminar is a forum for presenting research papers and for reporting recent work done by graduate students and faculty members in the broad area of embedded systems design. Therefore, anyone who is interested in this area is invited to attend this seminar and is also welcome to give a talk. Graduate students working in the area of embedded systems/computer engineering are especially encouraged to do so.
For new students, this seminar will also provide an opportunity to form an impression about the kind of research work that is going on in this area. No prior knowledge in the area of embedded systems is required to benefit from this seminar.
Each week there will be a presentation on some topic related to embedded systems. The presentation should last for about 45 minutes and can be quite informal. The primary idea is to have plenty of discussions on the topic of the presentation.
Faculty members involved:
| Time and Venue: | Every Wednesday from 2 - 3pm at TR 6 (SOC1, Level 3) |
| Date | Talk |
| 11 August | Lothar Thiele: "Embedded Systems Design Part I: Design Space Exploration of Embedded Systems" |
| 18 August | Abhik Roychoudhury: "Symbolic Execution of Behavioral Requirements" |
| 25 August | Lothar Thiele: "Embedded Systems Design Part II: The FunState Model of Computation" |
| 1 September | Phan Thi Xuan Linh: "Analysis of Stream Processing Systems" |
| 8 September | Pan Yu: "Scalable Custom Instructions Identification for Instruction-Set Extensible Processors" |
| 15 September | Edward Sim: "Hardware/Software Partitioning in the context of multi-function configurations" |
| 22 September | Yajun Ha: "Design Space Exploration of Hardware Virtual Machines for Networked and Co-Designed Applications" |
| 29 September | Girish Venkataramani: "Asynchronous Logic Design: What, Why and How?" |
| 6 October | Girish Venkataramani: "From C to Asynchronous Circuits: A Toolflow for Nanoscale Technologies" |
| 13 October | No Seminar. CS Seminar on Monday, October 11, from 4-5pm at
EC, SoC1, Level 5 Room 46. M. Balakrishnan (IIT Delhi): "SRIJAN: Synthesis of Application Specific Multiprocessors" See https://mysoc.nus.edu.sg/~cmsem/seminar_files/1856.txt |
| 20 October | Weng-Fai Wong: "Compiler-Orchestrated Prefetching via Speculation and Predication" |
| 27 October | Sun Meng: "Modelling and Refinement of State-based Software Systems: A Coalgebraic Perspective" |
| 3 November | Tulika Mitra: "Configuration Bitstream Compression for Dynamically Reconfigurable FPGAs" |
| 10 November | Supratik Chakraborty: "Automatic Disjunctive Partitioning for Symbolic Reachability Analysis" |
| Lothar Thiele: "Embedded Systems Design Part I: Design
Space Exploration of Embedded Systems" Abstract: Design space exploration (DSE) techniques are known to be essential at all levels of abstraction, for the hardware software codesign of complex embedded systems. The goal of is to evaluate different implementation alternatives for such systems. The two essential components of an exploration strategy are: (1) the estimation of system properties and (2) the exploration of alternatives or "walking" through the design space. In many cases, the choice of (1) restricts the possible alternatives for (2). This talk will deal with both these components of a DSE framework and in particular describe a number of new models and methods for system-level DSE, taking into account embedded software, process scheduling, computation on distributed resources, and communication related issues. Note: Lothar Thiele was a visiting professor at NUS during July-August 2004. During his stay he gave a series of four talks. The slides used for these talks, as well as the relevant papers may be found here . Two of these talks were joint CS and Embedded Systems seminars, and the remaining two were only CS seminars. |
| Abhik Roychoudhury: "Symbolic Execution of Behavioral
Requirements" Abstract: Message Sequence Charts (MSC) have traditionally been used as a weak form of behavioral requirements in software design; they denote scenarios which may happen. Live Sequence Charts (LSC) extend Message Sequence Charts by also allowing the designer to specify scenarios which must happen. Live Sequence Chart specifications are executable; their simulation allows the designer to play out potentially aberrant scenarios prior to software construction. In this paper, we propose the use of Constraint Logic Programming (CLP) for symbolic execution of requirements described as Live Sequence Charts. The utility of CLP stems from its ability to execute in the presence of uninstantiated variables. This allows us to simulate multiple scenarios at one go. For example, several scenarios which only differ from each other in the value of a variable may be executed as a single scenario where the variable is left uninstantiated. Similarly, we can simulate scenarios with an unbounded number of processes. We use the power of CLP to also simulate charts with non-trivial timing constraints. Current works on MSC/LSCs use data/control variables mainly for ease of specification; they are instantiated to concrete values during simulation. Thus, our work advances the state-of-the-art in simulation and checking of MSC based software requirements. |
| Lothar Thiele: "Embedded Systems Design Part II: The
FunState Model of Computation" Abstract: Recently, model-based design of embedded systems has become very popular. Towards this, formal models of computation are used as a starting point for the whole design process, which includes the mapping of an application onto an underlying hardware and the verification of properties required out of the design. This talk will present a class of functional models, called FunState, which explicitly separates control aspects from data processing. It will be shown that the FunState class of models includes various other well known models such as Petri Nets and extended finite state automata. Methods based on Interval Decision Diagrams (IDD) will also be presented for the verification of certain system properties and for deriving quasi-static schedules for the FunState model. |
| Phan Thi Xuan Linh: "Analysis of Stream Processing Systems"
Abstract: Stream processing systems have become more widespread and
prevalent today. With the extensive use of such systems, it comes as no
surprise that much work have been done in the area. Of these, an exception
is formal analysis techniques wherein little work has been done to date.
There were existing approaches such as Network Calculus and Timed Automata,
however, they have limitations that leave much to be desired. |
| Pan Yu: "Scalable Custom Instructions Identification for
Instruction-Set Extensible Processors" Abstract: Extensible processors allow the addition of application-specific custom instructions to the core instruction set architecture. However, it is computationally expensive to automatically identify and select the optimal set of custom instructions. Therefore, heuristic techniques are often employed to quickly search the design space. In this talk, we discuss about an efficient algorithm for exact enumeration of all the possible candidate instructions given a dataflow graph (DFG) corresponding to a code fragment. Even though this is similar to the subgraph enumeration problem (which is exponential), we find that most subgraphs are not feasible candidates for various reasons. In fact, the number of candidates is quite small compared to the size of the DFG. Compared to previous approaches, our technique achieves orders of magnitude speedup in enumerating these candidate custom instructions for very large DFGs.
|
| Edward Sim: "Hardware/Software Partitioning in the context
of multi-function configurations" Abstract: Reconfiguration time is a major performance bottleneck for reconfigurable architectures. We believe that multi-function configurations can help to alleviate this problem. However, an efficient evaluation function is required for searching the design space. The difficulty is two-fold. 1) How to divide various functions into separate configuration. 2) How to implement each of these configurations. The talk will show that both aspects are to be included in the evaluation function and the focus will be upon the former issue rather than the latter.
|
| Yajun Ha: "Design Space Exploration of Hardware Virtual
Machines for Networked and Co-Designed Applications" Abstract: In the near
future, networked applications should support services that need substantial
more digital signal processing. Accordingly, they will consist of both
software and hardware parts, where the software part implements the less
computationally intensive work, while the hardware part accelerates the
computationally intensive work. In addition, networked applications should
conform to industry standards, which keep on evolving. As a result, they
require the client side implementation platforms to be networked,
high-performance and flexible. |
| Girish Venkataramani: "Asynchronous Logic Design: What, Why
and How?" Abstract: Synchronous circuits are sequenced by one or more globally distributed periodic timing signals called clocks. The advantage of the clock is that it provides the designer with a sequential abstraction of hardware design, which is inherently parallel. The legacy of circuit design is dominated by this design paradigm primarily because it is easier to reason about. However, as feature size shrinks and we approach billion-transistor SoCs, the distribution and synchronization of this clock signal across a (now much larger) nanoscale circuit is becoming increasingly complex. Asynchronous circuit design, on the other hand, refers to circuits without such clock signals. They have been shown to have the potential to alleviate complexity, produce low-power circuits, and to be scalable to future technologies. In this talk, I will provide an introduction to the asynchronous design paradigm describing basic design principles, how synchronization is usually achieved in the absence of clocks, and some common asynchronous design flows. In the process of the talk I will highlight features of these circuits that make them an attractive design alternative in the deep sub-micron era. In a follow-up talk, I will describe in detail an instance of an asynchronous design flow, CASH, developed by the Phoenix project at Carnegie Mellon University. Slides: http://www.cs.cmu.edu/~girish/talks/nus_es_seminar1.ppt Bio: Girish Venkataramani
is currently a Phd candidate in the Electrical and Computer Engineering
department at Carnegie Mellon University, USA. His research interests focus
on asynchronous circuit design and high-level circuit synthesis. In the
past, he has investigated the integration of reconfigurable fabrics with
mainstream processors, and optimizing compilers that target such
architectures. He works in the Phoenix research project at CMU, and is
advised by Dr. Seth Goldstein. At NUS, he is a student intern working with
Dr. Thiagarajan. |
| Girish Venkataramani: "From C to Asynchronous Circuits: A
Toolflow for Nanoscale Technologies" Abstract: For five decades, Moore's law has supplied chip designers with an exponentially increasing amount of computational resources. Architects have taken advantage of these additional resources to produce faster and more parallel machines. This relentless advance is proving to be a double-edged sword as chip complexity is also out-pacing design productivity and designers are unable to efficiently take advantage of the wealth of available resources. In this talk, I will describe a CAD toolflow, CASH, that addresses the design scalability problem. CASH automatically synthesizes energy efficient asynchronous circuits directly from ANSI-C programs. The key insight here is fully automated synthesis of circuits with distributed control and localized communication. CASH is built around a compiler that converts the C program into a functional, dataflow intermediate representation, exposing instruction-level, pipeline and memory parallelism. The compiler performs optimizations and synthesizes pipelined asynchronous circuits, in which control is distributed, communication is achieved through local wires, and arbitration for datapath resources is unnecessary. Kernels synthesized from the Mediabench suite exhibit excellent energy-delay characteristics. Slides: http://www.cs.cmu.edu/~girish/talks/nus_es_seminar2.ppt
|
| No Seminar |
| Weng-Fai Wong: "Compiler-Orchestrated Prefetching via
Speculation and Predication" Abstract: We introduce a compiler-orchestrated prefetching system as a unified framework geared toward ameliorating the gap between processing speeds and memory access latencies. We focus the scope of the optimization on specific subsets of the program dependence graph that succinctly characterize the memory access pattern of both regular array-based applications and irregular pointer-intensive programs. We illustrate how program embedded precomputation via speculative execution can accurately predict and effectively prefetch future memory references with negligible overhead. The proposed techniques reduce the total running time of seven SPEC benchmarks and two OLDEN benchmarks by an average of 27% on an Itanium~2 processor. The improvements are in addition to several state-of-the-art optimizations including software pipelining and data prefetching. In addition, we use cycle-accurate simulations to identify important and lightweight architectural innovations that further mitigate the memory system bottleneck. In particular, we focus on the notoriously challenging class of pointer-chasing applications, and demonstrate how they may benefit from a novel scheme of sentineled prefetching. Our results for twelve SPEC benchmarks demonstrate that 45% of the processor stalls that are caused by the memory system are avoidable. The techniques in this paper can effectively mask long memory latencies with little instruction overhead, and can readily contribute to the performance of processors today. |
| Sun Meng: "Modelling and Refinement of State-based Software
Systems: A Coalgebraic Perspective" Abstract: Both algebras and coalgebras
are devices that provide abstract descriptions of a variety of essential
aspects in computer science, in particular of data and behavioral
structures, respectively. The former are specified by a set of constructors
and the latter by a collection of destructors (observers). In recent years,
coalgebras have been recognized as a promising framework to model and reason
about dynamical, state-based systems, to which a user only has limited
access by specified operations. |
| Tulika Mitra: "Configuration Bitstream Compression for
Dynamically Reconfigurable FPGAs" Abstract: Field Programmable Gate Arrays
(FPGAs) holds the possibility of dynamic reconfiguration. The key advantages
of dynamic reconfiguration are the ability to rapidly adapt to dynamic
changes and better utilization of the programmable hardware resources for
multiple applications. However, with the advent of multi-million gate
equivalent FPGAs, configuration time is increasingly becoming a concern.
High reconfiguration cost can potentially wipe out any gains from dynamic
reconfiguration. One solution to alleviate this problem is to exploit the
high levels of redundancy in the configuration bitstream by compression. In
this work, we propose a novel configuration compression technique that
exploits redundancies both within a configuration's bitstream as well as
between bitstreams of multiple configurations. By maximizing reuse, our
results show that the proposed technique performs 26.5--75.8% better than
the previously proposed techniques. To the best of our knowledge, ours is
the first work that performs inter-configuration compression. |
| Supratik Chakraborty: "Automatic Disjunctive Partitioning
for Symbolic Reachability Analysis" Abstract: Reachability analysis is a fundamental problem in verification, analysis and optimization of digital systems. It is well known that transition relation partitioning often leads to significant advantages during reachability analysis. In the context of synchronous sequential circuits, conjunctive partitioning has been the predominant choice so far. However, disjunctive partitioning holds a lot of promise, if the partitioning method can be automated and made efficient. In this talk, we explore a technique for automatically decomposing a symbolic transition relation into a disjunction of symbolic transition relations (additive decomposition). Each transition relation thus obtained describes a Kripke structure with a unique successor/predecessor of each state. The special structure of the decomposed relations allows us to use composition instead of quantification during reachability analysis. We describe different optimizations and design choices in the decomposition algorithm and the corresponding tradeoffs. We also describe some characterization results for Kripke structure and their impact on decomposition. Preliminary experiments show that automated decomposition plus composition-based reachability analysis has low overhead when analyzing a large no. of ISCAS89 benchmark circuits. Bio: Supratik Chakraborty is the James R. Isaac Chair Assistant Professor
of Computer Science and Engineering at IIT Bombay, where he has been since
1999. From 1998 to 1999, he was a member of the research staff of the
Advanced CAD Research group at Fujitsu Laboratories of America, Inc, where
he worked on timing optimization of high-speed circuits by logical
transformations. He received his B. Tech. in Computer Science and
Engineering from IIT Kharagpur in 1993, and was awarded the President of
India Gold Medal. Subsequently, he received his M.S. and Ph.D. degrees in
Electrical Engineering from Stanford University in 1995 and 1998. His Ph.D.
thesis was on polynomial-time techniques for approximate timing analysis of
asynchronous systems. |
Samarjit Chakraborty, October 2004