OMPi on the Raspberry Pi 4/5 boards

Starting from version 3.5.0, OMPi officially supports the Raspberry Pi 4 and 5 boards, with their VideoCore VI and VII GPUs, respectively, through the Vulkan module. Setting up the compiler to work for these boards is not much different from deploying to any other system.

Minimum requirements

  • Meson 0.58.0needed for building OMPi
  • Raspberry Pi OS — needed for supporting OpenMP on the CPU
  • Vulkan — additionally needed for OpenMP offloading to the VideoCore GPU

Installation

Set up OMPi as follows:

meson setup build --prefix=‹install-dir›

 

Then OMPi can be compiled and installed as usual:

cd build/
meson compile
meson install

Note that the user does not need to provide any additional flags; the OpenMP vulkan module is installed by default.

Usage

Compilation

Compiling OpenMP applications with OMPi on a Raspberry Pi board is a quite straightforward process. Simply run:

ompicc --devs=vulkan app.c # or --devs=all

The compiler will produce the main application executable (a.out) and several kernel executables, specifically one for each OpenMP target construct existing in the application.

Advanced usage: Multiple compilation jobs

When an application contains several target regions, the user can additionally generate N parallel jobs to handle the compilation of the kernels, by using the -jN device option, as follows:

ompicc --devs=vulkan --devopt -jN app.c

Testing the Vulkan module

The correct installation of OMPi along with its vulkan module, can be verified by running:

ompiconf --devvinfo

On a Raspberry Pi 5, this command must print out the following information, assuming that vulkan is the only module installed:

1 configured device module(s): vulkan

MODULE [vulkan]:
------
OMPi Vulkan device module.
Available devices : 1

device id < 1 > { 
 Name: VideoCore VII
 API version: XYZ - Driver version: XYZ
 Vendor ID: 5549
 Maximum workgroup sizes: (2147483647, 65535, 65535)
 Maximum invocations: 1024
 4194304 Kbytes of memory
}
------

The programmer can moreover verify that the module is working properly by compiling the following sample application and running it:

Source code: Sample application utilizing the GPU
#include <stdio.h>
#include <omp.h>

int main(void)
{
    /* 
     * Expected result (assuming that the GPU device ID is 1):
     * Running on the VideoCore GPU.
     */
    int is_initial_device = 1;
    #pragma omp target map(tofrom:is_initial_device) device(1)
    {
        is_initial_device = omp_is_initial_device();
    }

    printf("Running on the %s.\n", (is_initial_device == 0) ? 
                                   "VideoCore GPU" : "CPU");

    return 0; 
} 

Mini-tutorial: Writing OpenMP applications which exploit the GPU

Targeting the VideoCore GPU of a Raspberry Pi 4/5 board is made possible by using the OpenMP target-related constructs. With these constructs, the programmer indirectly launches a Vulkan grid, that consists of multiple Vulkan workgroups containing multiple Vulkan invocations. Below you can find information about the syntax and usage of the most popular constructs. For detailed information, please consult the official OpenMP specifications.

Construct: #pragma omp target parallel
#pragma omp target parallel
    <structured-block>

This construct launches a Vulkan workgroup consisting of multiple threads. The default number of launched invocations is 128, but can be alternatively specified using the num_threads(N) clause.

Construct: #pragma omp target teams
#pragma omp target teams
    <structured-block>

This construct launches multiple Vulkan workgroups consisting of one invocation. The default number of launched workgroups is 1, but can be alternatively specified using the num_teams(N) clause.

Construct: #pragma omp target teams distribute parallel for
#pragma omp target teams distribute parallel for
    <for-loops>

This construct is a combination of the previous constructs and launches multiple Vulkan workgroups consisting of multiple invocations and additionally distributes the iterations of a loop across the launched workgroups and invocations.