Co-Design of Massively Parallel Embedded Processor Architectures
CoMap
Abstract
The CoMap project deals with the systematic (a) mapping, (b) evaluation, and (c)
exploration of massively parallel processor architectures that are designed for
special purpose applications in the world of embedded computers.
The investigated class of computer architectures can be described by massively
parallel networked processing elements which, using today's hardware technology,
may be implemented on a single chip (SoC - System on a Chip).
Existing approaches for mapping computation-intensive algorithms to parallel
architectures either consider the implementation in dedicated hardware or
the implementation on a given supercomputer.
Many intermediate solutions between these extremes are coming up
ranging from fine-grain FPGAs to large-grain processor arrays such as the
PACT XPP, Quicksilver's ACM, Bresca from Silicon Hive (Philips), NSC NAPA-1000,
and many others. In order to exploit all technological benefits and levels of
parallelism, the question remains how these array-like processing architectures
may be programmed efficiently.
Fact is that according to Moore's Law the gap between technology and
mapping efficiency is steadily increasing. What is needed here, are
more flexible and generic tools that can easily be retargeted to
a special-purpose architecture. Also, what is needed in order to avoid redesigns
of an architecture due to unforeseen inefficiency gaps is a methodology
that can handle and evaluate architectures and mappings for a certain range
of architectures (a) early in the design and (b) not only one particular architecture.
Under the code word CoMap, we try to summarize the main aspects of
our research proposal whose major goal is to enable a Co-Design in the sense
to design special purpose parallel processor architectures and efficient mapping
tools simultaneously. The major research aspects can be summarized as follows:
Architectural Research
Our special focus will study a new class of massively parallel architectures
we will call weakly-programmable arrays. Such architectures consist
of an array of processing elements (PE) that contain sub-word processing units
with only very few memory and a regular
interconnect structure. In order to efficiently implement a certain algorithm,
each PE may implement only a certain function range. Also, the instruction set
is limited and may be configured at compile-time or even dynamically at run-time.
The PEs are called weakly-programmable because the control overhead of each
PE is optimized and kept small.
Array Architecture Modeling and Simulation
In order to model and evaluate an architecture before actually designing it,
a formalism is needed to describe the major aspects and to clearly define the
interfaces needed by the mapping tools in order to generate efficient code.
Parameters such as topology, size, number and types of processing elements,
interconnect structures, memory architectures, etc. are needed in order to
massage a given algorithm properly. Furthermore, the architecture language
should be parameterizable in order to be able to study effects such as scalability
and to describe coarse-grain as well as fine-grain massively parallel architectures.
Furthermore, there should be also a path for synthesizing, simulating and
prototyping each modeled architecture in real hardware.
The generation of fast yet cycle-accurate simulators is a another important research
topic. Such simulators are necessay in order to be able to evaluate an architecture
and compiler co-design.

FFT on WPPA programming editor
Retargetable Mapping Methodology
In order to exploit benefits from massively parallel array architectures, computation-intensive
algorithms must be efficiently mapped. As nested loop programs inhibit such
parallelism to a large extend, a retargetable mapping strategy should be able
to cope with parameters of the architecture and to generate
programming and configuration codes.
For a parameterized architecture model, we would like to study the influence of the
mapping algorithms on the quality of the generated code. The goal is that such a generic compiler
extracts the parallelism from the given program and the architectural parameters
for steering the requirements and objectives of the mapping, e.g., bandwidth, memory,
and throughput constraints. The influence of architectural parameters such as number of PEs
and interconnect topologies on the mapping parameters has only be recently started
and only a few approaches are able to map loop specifications down to hardware at all.
Based on our long-term experience, we intend to leverage this gap and create and evaluate
the influence of architectural parameters and mapping parameters such as
type and order of algorithmic transformations such as localization, scheduling,
partitioning, and control generation.
In cooperation with the other project partners, we would like to
(a) couple our parameterized mapping methodology with different code generators
such as placement and routing tools for fine-grain hardware designs and (b)
test the quality of our simulations with (c) real-world applications.
Co-exploration of Architectures and Mappings
The ultimate goal is to co-explore and co-simulate applications on
architecture prototypes without actually designing them. Here, the
major questions are to specify which transformations must be applied how
in order to efficiently use the features of a certain architecture.

Design Space Exploration
Project partners
|
Technical University of Dresden |
Institut für Grundlagen der Elektrotechnik und Elektronik,
Fakultät Elektrotechnik und Informationstechnik, 01069 Dresden |
|
ENST-Bretagne |
Laboratoire d'Informatique des Télécommunications ENST Bretagne CS 83818 29238 Plouzané Frankreich
|
|
Université de Bretagne Occidentale |
Dept. Informatique,
Équipe Architectures et Systémes et EPML POMARD (47)s/c IRISA
Université de Bretagne Occidentale et STIC/CNRS
20 av.LeGorgeu 29285 Brest
Frankreich
|
|
ENSSAT-Lannion |
Équipe R2D2 et EMPL POMARD (47)
IRISA/Université de Rennes 1 et STIC/CNRS
ENSSAT BP 447 - 6 Rue de Kerampont 22305 Lannion Frankreich
|
Supported in part by the German Science Foundation (DFG) in project under
contract TE 163/13-1.
|