KFusion: obtaining modularity and performance with regards to general purpose GPU computing and co-processors

Date

2012-12-14

Authors

Kiemele, Liam

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Concurrency has recently come to the forefront of computing as multi-core processors become more and more common. General purpose graphics processing unit computing brings with them new language support for dealing with co-processor environments such as OpenCL and CUDA. Programming language support for multi-core architectures introduces a fundamentally new mechanism for modularity--a kernel. Developers attempting to leverage these mechanism to separate concerns often incur unanticipated performance penalties. My proposed solution aims to preserve the benefits of kernel boundaries for modularity, while at the same time eliminate these inherent costs at compile time and execution. KFusion is a prototype tool for transforming programs written in OpenCL to make them more efficient. By leveraging loop fusion and deforestation, it can eliminate the costs associated with compositions of kernels that share data. Case studies show that Kfusion can address key memory bandwidth and latency bottlenecks and result in substantial performance improvements.

Description

Keywords

modularity, performance, OpenCL, concurrency, parallelism

Citation