Goal and Challenges
The overall goal of our project is to provide architectures (both Embedded Systems (ES) and High Performance Computing (HPC)-oriented) with efficient mechanisms to offer performance dependability guarantees in the presence of unreliable time-dependent variations and aging throughout the lifetime of the system. This will be done by utilizing both proactive (in the absence of hard failures) and reactive (in the presence of hard failures) techniques. The term "performance dependability guarantee" refers to time-criticality in ES (i.e., meeting deadlines), and a predefined bound on the performance deviation from the nominal specifications in the case of HPC. The promise is to achieve this reliability guarantee in both domains with a reasonable energy overhead (e.g. less than 10% average). A significant improvement is hence achieved compared to the SotA, which now provides guarantees at the payoff of at least 50% overhead. In addition, we will provide a better flexibility in the platform design while still achieving power savings of at least 20%. To the best of our knowledge, this is the first project to attempt a holistic approach of providing dependable performance guarantees on both ES and HPC systems. This is done while taking into account various non-functional factors, such as timing, reliability, power, and ageing effects. The HARPA project aims to address several scientific challenges in this direction:
Shaving margins. Similar to the circuit technique Razor, but with different techniques at the microarchitecture and middleware, our aim is to introduce margin shaving concepts into aspects of a system that are typically over-provisioned for the worst case.
A more predictable system with real-time guarantees, where needed. The different monitors, knobs, and the HARPA engine will make the target system more predictable and proactively act on performance variability prior to hard failures.
Implementation of effective platform monitors and knobs. HARPA will select the appropriate monitors and knobs and their correct implementation to reduce efficiency and performance overheads.
HARPA Engine Overview
Figure 1(a) below shows the main concepts of the HARPA architecture and the main components of an architecture that can provide performance-dependability guarantees. Note that this generic framework applies to both embedded systems and high-performance general-purpose systems. The main elements that distinguish a HARPA-enabled system are: (i) Monitors and knobs, (ii) User requirements and (iii) HARPA Engine. Conceptually, the HARPA Engine mainly consists of a feedback loop (Figure 1(b)), where the different metrics (performance, timing, power, temperature, errors and manifestations of time-dependent variations etc.) of the system are continuously monitored. The HARPA engine actuates the knobs to bias the execution flow as desired, based on the state of the system and the performance (timing/throughput) requirements of the application. It is the HARPA engine that will implement the various control strategies aiming to provide dependable-performance in the presence of (highly) unreliable time-dependent variations. The goal is to exploit different manifestations of what we term as platform slack (i.e., slack in performance, power, energy, temperature, lifetime, and structures/components), in order to ascertain timing guarantees throughout the lifetime of the device (in spite of time-dependent variability) and maintain the expected lifetime of the system. By combining performance dependability techniques from both the embedded systems and high-performance worlds, HARPA will enable cross-fertilization of techniques and mechanisms from these two converging domains and transfer methodologies from one to the other. The underlying architecture is a heterogeneous architecture, depicted in Figure 1(a) consisting of a single-ISA multicore environment, where cores have different performance and power characteristics.
Figure 1: HARPA concepts and framework
Complete Workpackage List
The research strategy of the HARPA project is directly derived from and aligned with its main objectives and principles (Figure 2). Exploratory research activities will therefore be dedicated to dig into and push forward the principles of the mixed proactive/reactive HARPA engine distributed among the operating system (OS) engine (WP1), the run time engine (WP2), and the monitors/knobs used to observe and control behaviours of the system (WP3). The reliability guarantees will be provided by the HARPA engine based on a parametric reliability model abstracted from device and wire models for deeply scaled technologies up to the HARPA platform architecture level (WP4). A full system simulator and an experimental board (also from WP4) to be used as an experimental platform will be designed to demonstrate the HARPA engine approach at the proof-of-concept level and to support the other related research activities. These technical activities will be nurtured by the described application requirements, and they will be guided by case study applications designed and tested in (WP5). A dissemination and exploitation plan has been defined (WP6) for the diffusion of the results of the project to the scientific and industrial community, as well as to the broad public, and for their exploitation and IP protection. The project will be coordinated and orchestrated throughout its life according to a structured project management plan (WP7), ensuring the achievement of the S&T objectives while enforcing a careful planning and control activity over costs and deadlines.
Figure 2: Overall Workpackage Structure
WP1: HARPA OS engine [M1-M33]
WP Leader: POLIMI
WP2: HARPA RT [M1-M30]
WP Leader: ICCS
WP3: Monitors, knobs and models [M1-M33]
WP Leader: UCY
WP4: Experimental Platform Reliability Modeling and Usage [M1-M36]
WP Leader: IMEC
WP5: Application and Validation [M1-M36]
WP Leader: THALES
WP6:Dissemination and Exploitation [M1-M36]
WP Leader: IT4I
WP7: Project management [M1-M36]
WP Leader: POLIMI