Embedded Systems Weekly Seminar

Semester 1, 2004-05

This seminar is a forum for presenting research papers and for reporting recent work done by graduate students and faculty members in the broad area of embedded systems design. Therefore, anyone who is interested in this area is invited to attend this seminar and is also welcome to give a talk. Graduate students working in the area of embedded systems/computer engineering are especially encouraged to do so.

For new students, this seminar will also provide an opportunity to form an impression about the kind of research work that is going on in this area. No prior knowledge in the area of embedded systems is required to benefit from this seminar.

Each week there will be a presentation on some topic related to embedded systems. The presentation should last for about 45 minutes and can be quite informal. The primary idea is to have plenty of discussions on the topic of the presentation.

Faculty members involved:

Time and Venue: Every Wednesday from 2 - 3pm at TR 6 (SOC1, Level 3)

Date Talk
11 August Lothar Thiele: "Embedded Systems Design Part I: Design Space Exploration of Embedded Systems"
18 August Abhik Roychoudhury: "Symbolic Execution of Behavioral Requirements"
25 August Lothar Thiele: "Embedded Systems Design Part II: The FunState Model of Computation"
1 September Phan Thi Xuan Linh: "Analysis of Stream Processing Systems"
8 September Pan Yu: "Scalable Custom Instructions Identification for Instruction-Set Extensible Processors"
15 September Edward Sim: "Hardware/Software Partitioning in the context of multi-function configurations"
22 September Yajun Ha: "Design Space Exploration of Hardware Virtual Machines for Networked and Co-Designed Applications"
29 September Girish Venkataramani: "Asynchronous Logic Design: What, Why and How?"
6 October Girish Venkataramani: "From C to Asynchronous Circuits: A Toolflow for Nanoscale Technologies"
13 October No Seminar. CS Seminar on Monday, October 11, from 4-5pm at EC, SoC1, Level 5 Room 46.
M. Balakrishnan (IIT Delhi): "SRIJAN: Synthesis of Application Specific Multiprocessors"
See https://mysoc.nus.edu.sg/~cmsem/seminar_files/1856.txt
20 October Weng-Fai Wong: "Compiler-Orchestrated Prefetching via Speculation and Predication"
27 October Sun Meng: "Modelling and Refinement of State-based Software Systems: A Coalgebraic Perspective"
3 November Tulika Mitra: "Configuration Bitstream Compression for Dynamically Reconfigurable FPGAs"
10 November Supratik Chakraborty: "Automatic Disjunctive Partitioning for Symbolic Reachability Analysis"

Abstracts

Lothar Thiele: "Embedded Systems Design Part I: Design Space Exploration of Embedded Systems"

Abstract: Design space exploration (DSE) techniques are known to be essential at all levels of abstraction, for the hardware software codesign of complex embedded systems. The goal of is to evaluate different implementation alternatives for such systems. The two essential components of an exploration strategy are: (1) the estimation of system properties and (2) the exploration of alternatives or "walking" through the design space. In many cases, the choice of (1) restricts the possible alternatives for (2). This talk will deal with both these components of a DSE framework and in particular describe a number of new models and methods for system-level DSE, taking into account embedded software, process scheduling, computation on distributed resources, and communication related issues.

Note: Lothar Thiele was a visiting professor at NUS during July-August 2004. During his stay he gave a series of four talks. The slides used for these talks, as well as the relevant papers may be found here . Two of these talks were joint CS and Embedded Systems seminars, and the remaining two were only CS seminars.

Abhik Roychoudhury: "Symbolic Execution of Behavioral Requirements"

Abstract: Message Sequence Charts (MSC) have traditionally been used as a weak form of behavioral requirements in software design; they denote scenarios which may happen. Live Sequence Charts (LSC) extend Message Sequence Charts by also allowing the designer to specify scenarios which must happen. Live Sequence Chart specifications are executable; their simulation allows the designer to play out potentially aberrant scenarios prior to software construction. In this paper, we propose the use of Constraint Logic Programming (CLP) for symbolic execution of requirements described as Live Sequence Charts. The utility of CLP stems from its ability to execute in the presence of uninstantiated variables. This allows us to simulate multiple scenarios at one go. For example, several scenarios which only differ from each other in the value of a variable may be executed as a single scenario where the variable is left uninstantiated. Similarly, we can simulate scenarios with an unbounded number of processes. We use the power of CLP to also simulate charts with non-trivial timing constraints. Current works on MSC/LSCs use data/control variables mainly for ease of specification; they are instantiated to concrete values during simulation. Thus, our work advances the state-of-the-art in simulation and checking of MSC based software requirements.

Lothar Thiele: "Embedded Systems Design Part II: The FunState Model of Computation"

Abstract: Recently, model-based design of embedded systems has become very popular. Towards this, formal models of computation are used as a starting point for the whole design process, which includes the mapping of an application onto an underlying hardware and the verification of properties required out of the design. This talk will present a class of functional models, called FunState, which explicitly separates control aspects from data processing. It will be shown that the FunState class of models includes various other well known models such as Petri Nets and extended finite state automata. Methods based on Interval Decision Diagrams (IDD) will also be presented for the verification of certain system properties and for deriving quasi-static schedules for the FunState model.

Phan Thi Xuan Linh: "Analysis of Stream Processing Systems"

Abstract: Stream processing systems have become more widespread and prevalent today. With the extensive use of such systems, it comes as no surprise that much work have been done in the area. Of these, an exception is formal analysis techniques wherein little work has been done to date. There were existing approaches such as Network Calculus and Timed Automata, however, they have limitations that leave much to be desired.
This talk presents a brief survey of recent work that has been done in the area of stream processing. We will propose a new model for representing event stream systems and discuss our ongoing work on developing analysis techniques for this model.

Pan Yu: "Scalable Custom Instructions Identification for Instruction-Set Extensible Processors"

Abstract: Extensible processors allow the addition of application-specific custom instructions to the core instruction set architecture. However, it is computationally expensive to automatically identify and select the optimal set of custom instructions. Therefore, heuristic techniques are often employed to quickly search the design space. In this talk, we discuss about an efficient algorithm for exact enumeration of all the possible candidate instructions given a dataflow graph (DFG) corresponding to a code fragment. Even though this is similar to the subgraph enumeration problem (which is exponential), we find that most subgraphs are not feasible candidates for various reasons. In fact, the number of candidates is quite small compared to the size of the DFG. Compared to previous approaches, our technique achieves orders of magnitude speedup in enumerating these candidate custom instructions for very large DFGs.

 

Edward Sim: "Hardware/Software Partitioning in the context of multi-function
configurations"

Abstract: Reconfiguration time is a major performance bottleneck for reconfigurable architectures. We believe that multi-function configurations can help to alleviate this problem. However, an efficient evaluation function is required for searching the design space. The difficulty is two-fold. 1) How to divide various functions into separate configuration. 2) How to implement each of these configurations. The talk will show that both aspects are to be included in the evaluation function and the focus will be upon the former issue rather than the latter.

 

Yajun Ha: "Design Space Exploration of Hardware Virtual Machines for Networked and Co-Designed Applications"

Abstract: In the near future, networked applications should support services that need substantial more digital signal processing. Accordingly, they will consist of both software and hardware parts, where the software part implements the less computationally intensive work, while the hardware part accelerates the computationally intensive work. In addition, networked applications should conform to industry standards, which keep on evolving. As a result, they require the client side implementation platforms to be networked, high-performance and flexible.
Networked reconfigurable platforms that combine processors and reconfigurable architectures (RAs) are a good choice to meet the above requirements. Their processors can be re-programmed with software binaries to implement the software parts of the networked applications, and their RAs can be re-programmed with configuration bitstreams to implement the hardware parts of the networked applications. Both software binaries and configuration bitstreams can be downloaded from the network.
However, networked reconfigurable platforms bring extra complexity to the networked application design. The complexity results from the fact that different reconfigurable platforms use different combinations of processors, RAs and their communication interfaces. Consequently, we have to prepare different versions of software binaries and RA configuration bitstreams even for the same application.
To solve this problem, we develop an abstract reconfigurable platform, which builds an abstraction for each of the three reconfigurable platform components: processor, RA and processor/RA interface. As an abstraction for the processor, we use the existing Java virtual machine (JVM). In this talk, we focus on developing a hardware virtual machine (HVM) as an abstraction for the RA, and a virtual communication interface as an abstraction for the processor/RA interface.
In the first part of the talk, we describe our research on the HVM. We will implement four HVM instances, measure and compare their quality metrics, build an analytical framework to characterize their design space, validate the framework and use it to perform the design space exploration.
In the second part of the talk, we describe our research on the integration of the HVM with the JVM. We will develop a virtual interface that provides the platform independent communication and a run-time manager that efficiently assigns a Java method to be implemented on the JVM or the HVM.
By using the abstract reconfigurable platform in a client terminal, a networked application containing both software and hardware parts needs to be designed only once on the server, then it can be deployed on any client.
 

Girish Venkataramani: "Asynchronous Logic Design: What, Why and How?"

Abstract: Synchronous circuits are sequenced by one or more globally distributed periodic timing signals called clocks. The advantage of the clock is that it provides the designer with a sequential abstraction of hardware design, which is inherently parallel. The legacy of circuit design is dominated by this design paradigm primarily because it is easier to reason about. However, as feature size shrinks and we approach billion-transistor SoCs, the distribution and synchronization of this clock signal across a (now much larger) nanoscale circuit is becoming increasingly complex. Asynchronous circuit design, on the other hand, refers to circuits without such clock signals. They have been shown to have the potential to alleviate complexity, produce low-power circuits, and to be scalable to future technologies. In this talk, I will provide an introduction to the asynchronous design paradigm describing basic design principles, how synchronization is usually achieved in the absence of clocks, and some common asynchronous design flows. In the process of the talk I will highlight features of these circuits that make them an attractive design alternative in the deep sub-micron era. In a follow-up talk, I will describe in detail an instance of an asynchronous design flow, CASH, developed by the Phoenix project at Carnegie Mellon University.

Slides: http://www.cs.cmu.edu/~girish/talks/nus_es_seminar1.ppt

Bio: Girish Venkataramani is currently a Phd candidate in the Electrical and Computer Engineering department at Carnegie Mellon University, USA. His research interests focus on asynchronous circuit design and high-level circuit synthesis. In the past, he has investigated the integration of reconfigurable fabrics with mainstream processors, and optimizing compilers that target such architectures. He works in the Phoenix research project at CMU, and is advised by Dr. Seth Goldstein. At NUS, he is a student intern working with Dr. Thiagarajan.
 

Girish Venkataramani: "From C to Asynchronous Circuits: A Toolflow for Nanoscale Technologies"

Abstract: For five decades, Moore's law has supplied chip designers with an exponentially increasing amount of computational resources. Architects have taken advantage of these additional resources to produce faster and more parallel machines. This relentless advance is proving to be a double-edged sword as chip complexity is also out-pacing design productivity and designers are unable to efficiently take advantage of the wealth of available resources. In this talk, I will describe a CAD toolflow, CASH, that addresses the design scalability problem. CASH automatically synthesizes energy efficient asynchronous circuits directly from ANSI-C programs. The key insight here is fully automated synthesis of circuits with distributed control and localized communication. CASH is built around a compiler that converts the C program into a functional, dataflow intermediate representation, exposing instruction-level, pipeline and memory parallelism. The compiler performs optimizations and synthesizes pipelined asynchronous circuits, in which control is distributed, communication is achieved through local wires, and arbitration for datapath resources is unnecessary. Kernels synthesized from the Mediabench suite exhibit excellent energy-delay characteristics.

Slides: http://www.cs.cmu.edu/~girish/talks/nus_es_seminar2.ppt

 

No Seminar
Weng-Fai Wong: "Compiler-Orchestrated Prefetching via Speculation and Predication"

Abstract: We introduce a compiler-orchestrated prefetching system as a unified framework geared toward ameliorating the gap between processing speeds and memory access latencies. We focus the scope of the optimization on specific subsets of the program dependence graph that succinctly characterize the memory access pattern of both regular array-based applications and irregular pointer-intensive programs. We illustrate how program embedded precomputation via speculative execution can accurately predict and effectively prefetch future memory references with negligible overhead. The proposed techniques reduce the total running time of seven SPEC benchmarks and two OLDEN benchmarks by an average of 27% on an Itanium~2 processor. The improvements are in addition to several state-of-the-art optimizations including software pipelining and data prefetching. In addition, we use cycle-accurate simulations to identify important and lightweight architectural innovations that further mitigate the memory system bottleneck. In particular, we focus on the notoriously challenging class of pointer-chasing applications, and demonstrate how they may benefit from a novel scheme of sentineled prefetching. Our results for twelve SPEC benchmarks demonstrate that 45% of the processor stalls that are caused by the memory system are avoidable. The techniques in this paper can effectively mask long memory latencies with little instruction overhead, and can readily contribute to the performance of processors today.

Sun Meng: "Modelling and Refinement of State-based Software Systems: A Coalgebraic Perspective"

Abstract: Both algebras and coalgebras are devices that provide abstract descriptions of a variety of essential aspects in computer science, in particular of data and behavioral structures, respectively. The former are specified by a set of constructors and the latter by a collection of destructors (observers). In recent years, coalgebras have been recognized as a promising framework to model and reason about dynamical, state-based systems, to which a user only has limited access by specified operations.

In this talk, we provide an introduction to coalgebraic modelling of state-based software systems, and a refinement theory for such systems based on this coalgebraic framework. Finally, we show how such an approach may be applied in the area of software development.

Tulika Mitra: "Configuration Bitstream Compression for Dynamically Reconfigurable FPGAs"

Abstract: Field Programmable Gate Arrays (FPGAs) holds the possibility of dynamic reconfiguration. The key advantages of dynamic reconfiguration are the ability to rapidly adapt to dynamic changes and better utilization of the programmable hardware resources for multiple applications. However, with the advent of multi-million gate equivalent FPGAs, configuration time is increasingly becoming a concern. High reconfiguration cost can potentially wipe out any gains from dynamic reconfiguration. One solution to alleviate this problem is to exploit the high levels of redundancy in the configuration bitstream by compression. In this work, we propose a novel configuration compression technique that exploits redundancies both within a configuration's bitstream as well as between bitstreams of multiple configurations. By maximizing reuse, our results show that the proposed technique performs 26.5--75.8% better than the previously proposed techniques. To the best of our knowledge, ours is the first work that performs inter-configuration compression.
 

Supratik Chakraborty: "Automatic Disjunctive Partitioning for Symbolic Reachability Analysis"

Abstract: Reachability analysis is a fundamental problem in verification, analysis and optimization of digital systems. It is well known that transition relation partitioning often leads to significant advantages during reachability analysis. In the context of synchronous sequential circuits, conjunctive partitioning has been the predominant choice so far. However, disjunctive partitioning holds a lot of promise, if the partitioning method can be automated and made efficient. In this talk, we explore a technique for automatically decomposing a symbolic transition relation into a disjunction of symbolic transition relations (additive decomposition). Each transition relation thus obtained describes a Kripke structure with a unique successor/predecessor of each state. The special structure of the decomposed relations allows us to use composition instead of quantification during reachability analysis. We describe different optimizations and design choices in the decomposition algorithm and the corresponding tradeoffs. We also describe some characterization results for Kripke structure and their impact on decomposition. Preliminary experiments show that automated decomposition plus composition-based reachability analysis has low overhead when analyzing a large no. of ISCAS89 benchmark circuits.

Bio: Supratik Chakraborty is the James R. Isaac Chair Assistant Professor of Computer Science and Engineering at IIT Bombay, where he has been since 1999. From 1998 to 1999, he was a member of the research staff of the Advanced CAD Research group at Fujitsu Laboratories of America, Inc, where he worked on timing optimization of high-speed circuits by logical transformations. He received his B. Tech. in Computer Science and Engineering from IIT Kharagpur in 1993, and was awarded the President of India Gold Medal. Subsequently, he received his M.S. and Ph.D. degrees in Electrical Engineering from Stanford University in 1995 and 1998. His Ph.D. thesis was on polynomial-time techniques for approximate timing analysis of asynchronous systems.

His research interests include design, analysis and verification of asynchronous and mixed synchronous-asynchronous systems, and formal verification of hardware and software systems. He has published 6 journal papers, 13 conference papers and several workshop papers. He has also presented tutorials and invited talks on formal verification of digital systems, performance analysis of asynchronous systems and timing analysis techniques for GALS interfaces at several conferences and workshops. He received a best paper award at the 1998 IEEE International Conference on Computer Design.


Samarjit Chakraborty, October 2004