Friedrich-Alexander-Universität DruckenUnivisEnglish FAU-Logo
Techn. Fakultät Willkommen am Department Informatik FAU-Logo
Codesign
Lehrstuhl für Informatik 12
CoMap
Department Informatik  >  Informatik 12  >  Forschung  >  Co-Design massiv parallel eingebetteter Prozessorarchitekturen

Co-Design of Massively Parallel Embedded Processor Architectures

CoMap

Abstract

The CoMap project deals with the systematic (a) mapping, (b) evaluation, and (c) exploration of massively parallel processor architectures that are designed for special purpose applications in the world of embedded computers.

The investigated class of computer architectures can be described by massively parallel networked processing elements which, using today's hardware technology, may be implemented on a single chip (SoC - System on a Chip).

Existing approaches for mapping computation-intensive algorithms to parallel architectures either consider the implementation in dedicated hardware or the implementation on a given supercomputer. Many intermediate solutions between these extremes are coming up ranging from fine-grain FPGAs to large-grain processor arrays such as the PACT XPP, Quicksilver's ACM, Bresca from Silicon Hive (Philips), NSC NAPA-1000, and many others. In order to exploit all technological benefits and levels of parallelism, the question remains how these array-like processing architectures may be programmed efficiently.

Fact is that according to Moore's Law the gap between technology and mapping efficiency is steadily increasing. What is needed here, are more flexible and generic tools that can easily be retargeted to a special-purpose architecture. Also, what is needed in order to avoid redesigns of an architecture due to unforeseen inefficiency gaps is a methodology that can handle and evaluate architectures and mappings for a certain range of architectures (a) early in the design and (b) not only one particular architecture.

Under the code word CoMap, we try to summarize the main aspects of our research proposal whose major goal is to enable a Co-Design in the sense to design special purpose parallel processor architectures and efficient mapping tools simultaneously. The major research aspects can be summarized as follows:

Architectural Research

Our special focus will study a new class of massively parallel architectures we will call weakly-programmable arrays. Such architectures consist of an array of processing elements (PE) that contain sub-word processing units with only very few memory and a regular interconnect structure. In order to efficiently implement a certain algorithm, each PE may implement only a certain function range. Also, the instruction set is limited and may be configured at compile-time or even dynamically at run-time. The PEs are called weakly-programmable because the control overhead of each PE is optimized and kept small.

Array Architecture Modeling and Simulation

In order to model and evaluate an architecture before actually designing it, a formalism is needed to describe the major aspects and to clearly define the interfaces needed by the mapping tools in order to generate efficient code. Parameters such as topology, size, number and types of processing elements, interconnect structures, memory architectures, etc. are needed in order to massage a given algorithm properly. Furthermore, the architecture language should be parameterizable in order to be able to study effects such as scalability and to describe coarse-grain as well as fine-grain massively parallel architectures. Furthermore, there should be also a path for synthesizing, simulating and prototyping each modeled architecture in real hardware. The generation of fast yet cycle-accurate simulators is a another important research topic. Such simulators are necessay in order to be able to evaluate an architecture and compiler co-design.



FFT on WPPA programming editor

Retargetable Mapping Methodology

In order to exploit benefits from massively parallel array architectures, computation-intensive algorithms must be efficiently mapped. As nested loop programs inhibit such parallelism to a large extend, a retargetable mapping strategy should be able to cope with parameters of the architecture and to generate programming and configuration codes. For a parameterized architecture model, we would like to study the influence of the mapping algorithms on the quality of the generated code. The goal is that such a generic compiler extracts the parallelism from the given program and the architectural parameters for steering the requirements and objectives of the mapping, e.g., bandwidth, memory, and throughput constraints. The influence of architectural parameters such as number of PEs and interconnect topologies on the mapping parameters has only be recently started and only a few approaches are able to map loop specifications down to hardware at all. Based on our long-term experience, we intend to leverage this gap and create and evaluate the influence of architectural parameters and mapping parameters such as type and order of algorithmic transformations such as localization, scheduling, partitioning, and control generation. In cooperation with the other project partners, we would like to (a) couple our parameterized mapping methodology with different code generators such as placement and routing tools for fine-grain hardware designs and (b) test the quality of our simulations with (c) real-world applications.

Co-exploration of Architectures and Mappings

The ultimate goal is to co-explore and co-simulate applications on architecture prototypes without actually designing them. Here, the major questions are to specify which transformations must be applied how in order to efficiently use the features of a certain architecture.



Design Space Exploration

Project partners

Technical University of Dresden Institut für Grundlagen der Elektrotechnik und Elektronik,
Fakultät Elektrotechnik und Informationstechnik,
01069 Dresden
ENST-Bretagne

Laboratoire d'Informatique des Télécommunications
ENST Bretagne CS 83818
29238  Plouzané
Frankreich

Université de Bretagne Occidentale

Dept. Informatique, Équipe Architectures et Systémes et EPML POMARD (47)s/c IRISA
Université de Bretagne Occidentale et STIC/CNRS
20 av.LeGorgeu
29285  Brest
Frankreich

ENSSAT-Lannion

Équipe R2D2 et EMPL POMARD (47)
IRISA/Université de Rennes 1 et STIC/CNRS
ENSSAT
BP 447 - 6 Rue de Kerampont
22305  Lannion
Frankreich

Supported in part by the German Science Foundation (DFG) in project under contract TE 163/13-1 and TE 163/13-2.

Contact

Frank Hannig
Hardware/Software Co-Design
Department of Computer Science
University of Erlangen-Nuremberg
Cauerstr. 11
91058 Erlangen, Germany

Publications

2011
52 D. Kissler, D. Gran, Z. Salcic, F. Hannig and J. Teich.
Scalable Many-Domain Power Gating in Coarse-grained Reconfigurable Processor Arrays.
IEEE Embedded Systems Letters, 3(2):58-61, 2011. ©1
51 D. Kissler, F. Hannig and J. Teich.
Efficient Evaluation of Power/Area/Latency Design Trade-offs for Coarse-Grained Reconfigurable Processor Arrays.
Journal of Low Power Electronics, 7(1):29-40, 2011. ©1
2010
50 F. Hannig.
Communication Synthesis of Loop Accelerator Pipelines.
Talk, Workshop on Compiler-Assisted System-On-Chip Assembly (CASA), Embedded Systems Week (ESWEEK), Scottsdale, AZ, USA, October 28, 2010. ©1
49 F. Hannig.
Retargetable Mapping of Loop Programs on Coarse-grained Reconfigurable Arrays.
Talk, International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS), Scottsdale, AZ, USA, October 26, 2010. ©1
48 T. Vander Aa, P. Raghavan, S. Mahlke, B. De Sutter, A. Shrivastava and F. Hannig.
Compilation Techniques for CGRAs: Exploring All Parallelization Approaches.
In Proceedings of the International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS), pp. 185-186, Scottsdale, AZ, USA, October 24-29, 2010. ©1
47 H. Dutta, F. Hannig, M. Schmid and J. Keinert.
Modeling and Synthesis of Communication Subsystems for Loop Accelerator Pipelines.
In Proceedings of the 21st IEEE International Conference on Application-specific Systems, Architectures, and Processors (ASAP), pp. 125-132, Rennes, France, July 7-9, pages 125-132, 2010. ©2
46 H. Dutta, F. Hannig and J. Teich.
PARO - A Design Tool for Synthesis of Hardware Accelerators for SoCs.
Tool Presentation at the University Booth at Design, Automation and Test in Europe (DATE), Dresden, Germany, March 8-12, 2010. ©1
2009
45 F. Hannig.
Scheduling Techniques for High-Throughput Loop Accelerators.
Dissertation, Hardware/Software Co-Design, Department of Computer Science, University of Erlangen-Nuremberg, Germany, August, 2009. ISBN 978-3-86853-220-3, Verlag Dr. Hut, Munich, Germany. ©1
44 H. Dutta, J. Zhai, F. Hannig and J. Teich.
Impact of Loop Tiling on the Controller Logic of Hardware Acceleration Engines.
Proceedings of 20th IEEE International Conference on Application-specific Systems, Architectures, and Processors (ASAP), pp. 161-168, Boston, MA, USA, July 7-9, 2009. ©1
43 J. Keinert, H. Dutta, F. Hannig, C. Haubelt and J. Teich.
Model-Based Synthesis and Optimization of Static Multi-Rate Image Processing Algorithms.
Proceedings of Design, Automation and Test in Europe (DATE 2009), IEEE Computer Society, Nice, France, April 20-24, 2009, pp. 135-140. ©1
42 J. Teich.
Invasives Rechnen.
GIBU-Jahrestreffen 2009, Schloss Dagstuhl, Wadern, 6.4.2009. Invited Talk. ©1
41 H. Dutta, F. Hannig and J. Teich.
Performance Matching of Hardware Acceleration Engines for Heterogeneous MPSoC using Modular Performance Analysis.
In Proceedings of the 22nd International Conference on Architecture of Computing Systems (ARCS), Delft, The Netherlands, pp. 233-245, March 10-13, 2009. ©1
40 F. Hannig, H. Dutta and J. Teich.
Parallelization Approaches for Hardware Accelerators - Loop Unrolling versus Loop Partitioning.
In Proceedings of the 22nd International Conference on Architecture of Computing Systems (ARCS), Delft, The Netherlands, pp. 16-27, March 10-13, 2009. ©1
39 D. Kissler, A. Strawetz, F. Hannig and J. Teich.
Power-efficient Reconfiguration Control in Coarse-grained Dynamically Reconfigurable Architectures.
Journal of Low Power Electronics, 5(1):96-105, American Scientific Publishers, 2009. ©1
38 H. Dutta, D. Kissler, F. Hannig, A. Kupriyanov, J. Teich and B. Pottier.
A Holistic Approach for Tightly Coupled Reconfigurable Parallel Processors.
Microprocessors and Microsystems, 33(1):53-62, 2009. ©1
2008
37 J. Teich.
Reconfigurability of Future Massively Parallel SoCs.
Talk at the Department of Electrical and Computer Engineering, National University of Singapore, October 17, 2008. ©1
36 C. Wolinski, K. Kuchcinski, J. Teich and F. Hannig.
Area and Reconfiguration Time Minimization of the Communication Network in Regular 2D Reconfigurable Architectures.
Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), pp. 391-396, Heidelberg, Germany, September 8-10, 2008. ©1
35 C. Wolinski, K. Kuchcinski, J. Teich and F. Hannig.
Communication Network Reconfiguration Overhead Optimization in Programmable Processor Array Architectures.
Proceedings of the 11th Euromicro Conference on Digital System Design (DSD), pp.345-352, Parma, Italy, September 3-5, 2008. ©1
34 R. Schaffer, R. Merker, F. Hannig and J. Teich.
Utilization of all Levels of Parallelism in a Processor Array with Subword Parallelism.
Proceedings of the 11th Euromicro Conference on Digital System Design (DSD), pp. 391-398, Parma, Italy, September 3-5, 2008. ©1
33 A. Kupriyanov, F. Hannig, D. Kissler and J. Teich.
MAML: An ADL for Designing Single and Multiprocessor Architectures.
In Prabhat Mishra and Nikil Dutt (eds.). Chapter 12 in Processor Description Languages, pp. 295-327. In Systems on Silicon Series, Morgan Kaufmann, June 2008. ©1
32 D. Kissler, A. Strawetz, F. Hannig and J. Teich.
Power-efficient Reconfiguration Control in Coarse-Grained Dynamically Reconfigurable Architectures.
In Proceedings of the 18th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), Lisbon, Portugal, September 10-12, 2008. LNCS vol. 5349, pp. 307–317. Springer, Heidelberg (2009). ©1
31 C. Wolinski, K. Kuchcinski, J. Teich and F. Hannig.
Optimization of Routing and Reconfiguration Overhead in Programmable Processor Array Architectures.
In Proceedings of the 16th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 306-309, Palo Alto, CA, USA, April 14-15, 2008. ©1
30 D. Kissler, H. Dutta, A. Kupriyanov, F. Hannig and J. Teich.
A High-Speed Dynamic Reconfigurable Multilevel Parallel Architecture.
Hardware and Software Demo at the University Booth at Design, Automation and Test in Europe (DATE), Munich, Germany, March 10-14, 2008. ©1
29 F. Hannig, H. Ruckdeschel and J. Teich.
The PAULA Language for Designing Multi-Dimensional Dataflow-Intensive Applications.
In Proceedings of the GI/ITG/GMM-Workshop - Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen, pp. 129-138, Freiburg, Germany, March 3-5, 2008. ©1
2007
28 H. Dutta, F. Hannig, A. Kupriyanov, D. Kissler, J. Teich, R. Schaffer, S. Siegel, R. Merker and B. Pottier.
Massively Parallel Processor Architectures: A Co-design Approach.
Proceedings of the 3rd International Workshop on Reconfigurable Communication Centric System-on-Chips (ReCoSoC), pp. 61-68, Montpellier, France, June 18-20, 2007. ©1
27 J. Teich, F. Hannig, H. Ruckdeschel, H. Dutta, D. Kissler and A. Stravet.
A Unified Retargetable Design Methodology for Dedicated and Re-Programmable Multiprocessor Arrays: Case Study and Quantitative Evaluation.
In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), Invited paper, pp. 14-24, Las Vegas, NV, USA, June 25-28, 2007. ©1
26 H. Dutta, F. Hannig, H. Ruckdeschel and J. Teich.
Efficient Control Generation for Mapping Nested Loop Programs onto Processor Arrays.
In Journal of Systems Architecture, 53(5-6):300-309, 2007. ©1
25 A. Kupriyanov, D. Kissler, F. Hannig and J. Teich.
Efficient Event-driven Simulation of Parallel Processor Architectures.
In Proceedings of the 10th International Workshop on Software and Compilers for Embedded Systems (SCOPES), Nice, France, pp. 71-80, April 20, 2007. ©1
24 A. Kupriyanov, F. Hannig, D. Kissler, J. Teich, J. Lallet, O. Sentieys and S. Pillement.
Modeling of Interconnection Networks in Massively Parallel Processor Architectures.
In Proceedings of the 20th International Conference on Architecture of Computing Systems (ARCS 2007), Springer LNCS series, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland, pp. 268-282, March 12-15, 2007. ©1
2006
23 D. Kissler, F. Hannig, A. Kupriyanov and J. Teich.
A Highly Parameterizable Parallel Processor Array Architecture.
In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT 2006), pp. 105-112, Bangkok, Thailand, December 13-15, 2006. ©1
22 D. Kissler, F. Hannig, A. Kupriyanov and J. Teich.
Hardware Cost Analysis for Weakly Programmable Processor Arrays.
In Proceedings of the International Symposium on System-on-Chip (SoC), pp. 179-182, Tampere, Finland, November 14-16, 2006. ©1
21 S. Siegel, R. Merker, F. Hannig and J. Teich.
Communication-conscious Mapping of Regular Nested Loop Programs onto Massively Parallel Processor Arrays.
In Proceedings of the 18th International Conference on Parallel and Distributed Computing and Systems (PDCS), pp. 71-76, Dallas, TX, USA, November 13-15, 2006. ©1
20 H. Dutta, F. Hannig and J. Teich.
Hierarchical Partitioning for Piecewise Linear Algorithms.
In Proceedings of the 5th International Symposium on Parallel Computing in Electrical Engineering (PARELEC), pp. 153-159, Bialystok, Poland, September 13-17, 2006. ©1
19 H. Dutta, F. Hannig, J. Teich, B. Heigl and H. Hornegger.
A Design Methodology for Hardware Acceleration of Adaptive Filter Algorithms in Image Processing.
In Proceedings of IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 331-337, Steamboat Springs, CO, USA, September 11-13, 2006. ©1
18 A. Kupriyanov, F. Hannig, D. Kissler, J. Teich, J. Lallet, O. Sentieys and S. Pillement.
Modeling of Interconnection Networks in Massively Parallel Processor Architectures.
Technical Report 05-2006, University of Erlangen-Nuremberg, Department of CS 12, Hardware-Software-Co-Design, Am Weichselgarten 3, 91058 Erlangen, Germany, August 2006. ©1
17 D. Kissler, F. Hannig, A. Kupriyanov and J. Teich.
A Dynamically Reconfigurable Weakly Programmable Processor Array Architecture Template.
In Proceedings of the 2nd International Workshop on Reconfigurable Communication-Centric System-on-Chips (ReCoSoC), pp. 31-37, France, July 3-5, 2006. ©1
16 D. Kissler, A. Kupriyanov, F. Hannig, D. Koch and J. Teich.
A Generic Framework for Rapid Prototyping of System-on-Chip Designs.
In Proceedings of the International Conference on Computer Design (CDES), pp. 189-195, Las Vegas, NV, USA, June 2006. ©1
15 F. Hannig, H. Dutta and J. Teich.
Mapping a Class of Dependence Algorithms to Coarse-grained Reconfigurable Arrays: Architectural Parameters and Methodology.
In International Journal of Embedded Systems, Vol. 2, Nos. 1/2, pp. 114-127, 2006. ©1
14 H. Dutta, F. Hannig and J. Teich.
A Formal Methodology for Hierarchical Partitioning of Piecewise Linear Algorithms.
Technical Report 04-2006, University of Erlangen-Nuremberg, Department of CS 12, Hardware-Software-Co-Design, Am Weichselgarten 3, 91058 Erlangen, Germany, April 2006. ©1
13 A. Kupriyanov, F. Hannig, D. Kissler, R. Schaffer and J. Teich.
MAML - An Architecture Description Language for Modeling and Simulation of Processor Array Architectures, Part I.
Technical Report 03-2006, University of Erlangen-Nuremberg, Department of CS 12, Hardware-Software-Co-Design, Am Weichselgarten 3, 91058 Erlangen, Germany, March 2006. ©1
12 H. Dutta, F. Hannig and J. Teich.
Controller Synthesis for Mapping Partitioned Programs on Array Architectures.
In Proceedings of the 19th International Conference on Architecture of Computing Systems (ARCS), Frankfurt/Main, Germany, pp. 176-191, March 13-16, 2006. ©1
11 A. Kupriyanov, F. Hannig, D. Kissler, J. Teich, R. Schaffer and R. Merker.
An Architecture Description Language for Massively Parallel Processor Architectures.
In Proceedings of the 9th ITG/GMM/GI Workshop, Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen, Dresden, Germany, pp. 11-20, February 20-22, 2006. ©1
10 H. Dutta, F. Hannig and J. Teich.
Mapping of Nested Loop Programs onto Massively Parallel Processor Arrays with Memory and I/O Constraints.
In Friedhelm Meyer auf der Heide and Burkhard Monien, editors, Proceedings of the 6th International Heinz Nixdorf Symposium, New Trends in Parallel & Distributed Computing, volume 181 of HNI-Verlagsschriftenreihe, pp. 97-119, Paderborn, Germany, January 17-18, 2006. ©1
2005
9 H. Dutta, F. Hannig and J. Teich.
Control Path Generation for Mapping Partitioned Dataflow-dominant Algorithms onto Array Architectures.
Technical Report 03-2005, University of Erlangen-Nuremberg, Department of CS 12, Hardware-Software-Co-Design, Am Weichselgarten 3, 91058 Erlangen, Germany, November 2005. ©1
8 H. Ruckdeschel, H. Dutta, F. Hannig and J. Teich.
Automatic FIR Filter Generation for FPGAs.
In Proceedings of the International Workshop on Embedded Computer Systems, Architectures, Modeling, and Simulation (SAMOS), Samos, Greece, pp. 51-61, July 18-20, 2005. ©1
7 F. Hannig and J. Teich.
Output Serialization for FPGA-based and Coarse-grained Processor Arrays.
In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), Las Vegas, NV, USA, pp. 78-84, June 27-30, 2005. ©1
6 F. Hannig, H. Dutta, A. Kupriyanov, J. Teich, R. Schaffer, S. Siegel, R. Merker, R. Keryell, B. Pottier and D. Chillet, D. Ménard, O. Sentieys.
Co-Design of Massively Parallel Embedded Processor Architectures.
In Proceedings of the first ReCoSoC Workshop. Montpellier, France, June 27-29, 2005. ©1
2004
5 F. Hannig and J. Teich.
Resource Constrained and Speculative Scheduling of an Algorithm Class with Run-Time Dependent Conditionals.
In Proceedings of the 15th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2004), pp. 17-27, Galveston, TX, USA, September 27-29, 2004. ©1
4 F. Hannig and J. Teich.
Dynamic Piecewise Linear/Regular Algorithms.
In Proceedings of the Fourth International Conference on Parallel Computing in Electrical Engineering (PARELEC 2004), pp. 79-84, Dresden, Germany, September 7-10, 2004. ©1
3 F. Hannig and J. Teich.
Resource Constrained and Speculative Scheduling of Dynamic Piecewise Regular Algorithms.
Technical Report 01-2004, University of Erlangen-Nuremberg, Department of CS 12, Hardware-Software-Co-Design, Am Weichselgarten 3, 91058 Erlangen, Germany, June 2004. ©1
2 F. Hannig, H. Dutta and J. Teich.
Regular Mapping for Coarse-grained Reconfigurable Architectures.
In Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), Vol. V, pp. 57-60, Montréal, Quebec, Canada, May 17-21, 2004. ©1
1 F. Hannig, H. Dutta and J. Teich.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays -- Constraints and Methodology.
In Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), Santa Fe, NM, USA, April 26-30, 2004. ©1
  Impressum Stand: 22 March 2012.   F.H.