OMPi: Offloading to Vulkan GPUs

Introduction

The latest version of OMPi enables offloading to Vulkan-supported GPUs through the vulkan module. OMPi analyzes the OpenMP target directives appearing in a user application and generates kernel source files, written in shader language (GLSL).

The vulkan module can:

  1. Compile the shader files to SPIR-V binaries,
  2. Optimize the generated binaries in terms of size and performance,
  3. Initialize and finalize a Vulkan device,
  4. Transfer data between the CPU and a Vulkan device,
  5. Offload compute shaders to the requested Vulkan device.

Requirements

The current requirements of the vulkan module are the following:

  • Vulkan Loader
    • sudo apt-get install libvulkan-dev
      OR
    • dnf install vulkan-loader-devel
  • Vulkan Validation layers and SPIR-V tools
    • sudo apt install vulkan-validationlayers spirv-tools glslang-tools
      OR
    • sudo dnf install mesa-vulkan-drivers vulkan-validation-layers-devel glslang
  • Vulkan Utilities
    • sudo apt-get install vulkan-tools
      OR
    • dnf install vulkan-tools

On Arch Linux,ou can run sudo pacman -S vulkan-devel to install all the required tools.

Before proceeding with the installation, verify the Vulkan installation and SPIR-V utilities by executing:

$ vulkaninfo

$ glslangValidator -v

The commands should display information about the Vulkan devices and Glslang version, respectively.

Installing OMPi

The first step is to deploy OMPi on your system. Initially, you need to build OMPi as usual:

meson setup build --prefix=<install-dir>
cd build/
meson compile
meson install

During the installation process, if your system satisfies the minimum requirements and has at least one Vulkan-enabled GPU installed, OMPi will automatically build the vulkan module. The correct installation of the module can be verified by running:

ompiconf --devvinfo

which lists all the identified modules/devices, along with their numeric device IDs. If the above command fails to show a Vulkan GPU, please revisit the minimum requirements and ensure you have installed OMPi correctly.

Sample output of devvinfo

The ompiconf --devvinfo command should give an output similar to the following (here an Ampere A2 GPU was identified by OMPi):

MODULE [vulkan]:
------
OMPi Vulkan device module.
Available devices : 1

device id < 1 > { 
 Name: NVIDIA A2
 API version: 4210991 - Driver version: 2392933632
 Vendor ID: 4318
 Maximum workgroup sizes: (2147483647, 65535, 65535)
 Maximum invocations: 1024
 15724544 Kbytes of memory
}
------

Quick start

Compiling your OpenMP application with Vulkan GPU offloading requires a simple --devs=all argument:

ompicc --devs=all app.c
Note

You can also use the --devs=vulkan option to limit execution on Vulkan GPUs only.

Be careful as this option changes the Vulkan device ID offsets. For example, if ompiconf reported:

MODULE [cuda] provides device(s) : 1
MODULE [vulkan] provides device(s) : 2 3

With this option the new Vulkan IDs will become 1 and 2, respectively, as CUDA will be excluded.

The compiler will produce the main application executable (a.out) and several SPIR-V kernel binaries, one for each OpenMP target construct in the application.

Sample OpenMP application utilizing the GPU

Here is a sample application that can be used to verify the successful deployment of the vulkan module:

#include <stdio.h>
#include <omp.h>

int main(void)
{
    /* 
     * Expected result (assuming that the GPU device ID is 1):
     * Running on Vulkan device 
     */
    int is_initial_device = 1;
    #pragma omp target map(tofrom:is_initial_device) device(1)
    {
        is_initial_device = omp_is_initial_device();
    }
    printf("Running on %s.\n", (is_initial_device == 0) ? 
                               "Vulkan device" : "host");
    return 0; 
} 
Advanced usage: Multiple compilation jobs

When an application contains several target regions, compilation times may become substantial. One can generate multiple compilation jobs that handle the compilation of different kernels in parallel, by using the -j<number> device option. For example to have 4 parallel kernel compilation jobs, use ompicc as follows:

ompicc --devs=all --devopt -j4 app.c

Limitations

Currently, the vulkan module has a number of limitations:

  1. Structs are allowed, but the above restriction also applies to struct fields,
  2. Only static loop scheduling is supported in target teams distribute parallel for directives,
  3. Pointers and conditional jumps are not allowed in the offloaded kernels.

Notes

The OMPi vulkan module has been tested with the following GPUs:

  • NVIDIA Ampere A2
  • NVIDIA Tesla P40
  • AMD RX 550
  • Intel UHD 770
  • VideoCore VII (Raspberry Pi 5)
  • VideoCore VI (Raspberry Pi 4)