Skip to main content

Compile OpenCL Kernel into LLVM-IR or Nvidia PTX

I'm writing this post when LLVM is up to 3.7 as its latest release. And libclc supports upto OpenCL 1.1 standard.


For those who are real beginners, please refer to the Getting Started with the LLVM System to build and install LLVM/Clang

Now you'd have LLVM/Clang installed, so let's install libclc.

git clone
cd libclc
make -j4
make install

Grab a test OpenCL kernel, like this one:

__kernel void matvec_mult(__global float4* matrix,
                          __global float4* vector,
                          __global float* result) {
   int i = get_global_id(0);
   result[i] = dot(matrix[i], vector[0]);

Use the following command to compile it.

clang -Dcl_clang_storage_class_specifiers -isystem libclc/generic/include -include clc/clc.h -target nvptx--nvidiacl -xcl -emit-llvm -S -o matvec.ll

Some explain about the command:

  • -target is used to specify the target GPU hardware and driver, eg. -target nvptx--nvidiacl or -target nvptx64--nvidiacl or -target r600 (for AMD R600 GPU Driver), etc.
  • -xcl means let clang treat as OpenCL kernel file

And here we get the LLVM-IR from the OpenCL kernel.

Further, we'll compile it into Nvidia PTX.

The following command links the OpenCL Kernel with built-in implementations from libclc:

llvm-link /usr/local/lib/clc/nvptx--nvidiacl.bc matvec.ll -o matvec.linked.bc

(Remember to link the corresponding .bc file, not always nvptx--nvidiacl.bc)

Then, we'd use clang to generate the final nvidia-ptx code using the following command:

clang -target nvptx--nvidiacl matvec.linked.bc -S -o matvec.nvptx.s

matvec.nvptx.s is the final Nvidia PTX code.


Comments powered by Disqus