Cuda vs opencl benchmark1/6/2024 ![]() Kim J et al (2012) SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. In: IEEE 26th international parallel and distributed processing symposium workshops PhD forum, pp 174–186 Kegel P et al (2012) dOpenCL: towards a uniform programming approach for distributed heterogeneous multi-/many-core systems. Karimi K et al (2010) A performance comparison of CUDA and OpenCL. Jia Y et al (2014) Caffe: convolutional architecture for fast feature embedding. Intel: how to increase performance by minimizing buffer copies on intel processor graphics (2018). Intel: CUDA Deep Neural Network Library (2018). Intel: Ambient Occlusion Benchmark (AOBench) (2014). Elsevier computer science library: operational programming systems series Halstead MH (1977) Elements of software science. In: LLVM compiler infrastructure in HPC, pp 1–11 Haidl M, Gorlatch S (2014) PACXX: towards a unified programming model for programming accelerators using C++14. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS, pp 161–172 Grasso I et al (2013) LibWater: heterogeneous distributed computing made easy. In: Encyclopedia of parallel computing, pp 1417–1422 Gorlatch S, Cole M (2011) Parallel skeletons. In: HLPP, pp 5–14Įrnsting S et al (2011) Data parallel skeletons for GPU clusters and multi-GPU systems. In: Parallel processing letters, pp 173–193Įnmyren J et al (2010) SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: International Conference on High Performance Computing Simulation, pp 224–231ĭuran A et al (2011) OmpSs: a proposal for programming heterogeneous multi-core architectures. In: Parallel computing, pp 391 – 407ĭuato J et al (2010) rCUDA: reducing the number of GPU-based accelerators in high performance clusters. ![]() In: Computing, pp 1195–1211ĭu P et al (2012) From CUDA to OpenCL: towards a performance-portable solution for multi-platform GPU programming. In: IEEE computational science and engineering, pp 46–55ĭastgeer U et al (2014) The PEPPHER composition tool: performance-aware dynamic composition of applications for GPU-based systems. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp 246–257ĭagum L et al (1998) OpenMP: an industry-standard api for shared-memory programming. In: Concurrency and computation: practice and experience, pp 187–198Ĭhang PP et al (1989) Inline function expansion for compiling C programs. In: Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, ICFP, pp 4–17Ĭedric A et al (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. In: Annals of software engineering, pp 57–94Ĭastro D et al (2016) Farms, pipes, streams and reforestation: reasoning about structured parallel processes using types and hylomorphisms. In: IEEE Trustcom/BigDataSE/ISPA, pp 172–177īoehm B et al (1995) Cost models for future software life cycle processes: COCOMO 2.0. Īldinucci M et al (2015) The loop-of-stencil-reduce paradigm. Rasch A, Gorlatch S (2018) ATF: a generic, directive-based auto-tuning framework. Our experiments show that dOCAL significantly simplifies the development of host code for heterogeneous and distributed systems, with a low runtime overhead. dOCAL combines major advantages over the state-of-the-art high-level approaches: (1) it simplifies implementing both OpenCL and CUDA host code by providing a simple-to-use, high-level abstraction API (2) it supports executing arbitrary OpenCL and CUDA programs (3) it allows conveniently targeting the devices of different nodes by automatically managing node-to-node communications (4) it simplifies implementing data transfer optimizations by providing different, specially allocated memory regions, e.g., pinned main memory for overlapping data transfers with computations (5) it optimizes memory management by automatically avoiding unnecessary data transfers (6) it enables interoperability between OpenCL and CUDA host code for systems with devices from different vendors. We develop distributed OpenCL/CUDA abstraction layer (dOCAL)-a novel high-level C++ library that simplifies the development of host code. Efficiently implementing host code is often a cumbersome task, especially when executing OpenCL and CUDA programs on systems with multiple nodes, each comprising different devices, e.g., multi-core CPU and graphics processing units the programmer is responsible for explicitly managing node’s and device’s memory, synchronizing computations with data transfers between devices of potentially different nodes and for optimizing data transfers between devices’ memories and nodes’ main memories, e.g., by using pinned main memory for accelerating data transfers and overlapping the transfers with computations. In the state-of-the-art parallel programming approaches OpenCL and CUDA, so-called host code is required for program’s execution.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |