3-Kernel Offloading for FPGA with Optimized Remote Accesses/ClipID:2197 previous clip next clip

Recording date 2012-05-18

Via

Free

Language

German

Organisational Unit

Sonderforschungsbereich/Transregio 89 Invasives Rechnen

Producer

MultiMediaZentrum

Some data- and compute-intensive applications can be accelerated by offloading portions of codes to platforms such as GPGPUs or FPGAs. However, to get high performance for these kernels, it is mandatory to restructure the application, to generate adequate communication mechanisms for the transfer of remote data, and to make good usage of the memory bandwidth. In the context of the high-level synthesis (HLS), from a C program, of hardware accelerators on FPGA, we show how to automatically generate optimized remote accesses for an accelerator communicating to an external DDR memory. Loop tiling is used to enable block communications, suitable for DDR memories. Pipelined communication processes are generated to overlap communications and computations, thereby hiding some latencies, in a way similar to double buffering. Finally, data reuse among tiles is exploited to avoid remote accesses when data are already available in the local memory.

Up next

Mitra
Prof. Subhasish Mitra
2012-05-24
Free
Schloss1
Prof. Dr. Lothar Thiele
2012-07-06
Free
Mueller-Schloer
Prof. Dr. Christian Müller-Schloer
2012-07-25
Free

More clips in this category "Friedrich-Alexander-Universität Erlangen-Nürnberg Zentralbereich"

2015-11-17
RRZE Intern
protected  
2018-10-01
IdM-login
protected