Installing and using OMPi on an NVIDIA Jetson Nano board is not much different from deploying OMPi on any other system. Below, you can find detailed instructions for a Jetson Nano environment.
Minimum requirements
- Meson 0.58.0 — needed for building OMPi
- Jetson Linux (L4T) — needed for supporting OpenMP on the CPU
- JetPack SDK 4.2 — additionally needed for OpenMP offloading to the GPU
Installation
Set up OMPi as follows:
meson setup build --prefix=‹install-dir›
Then OMPi can be compiled and installed as usual:
cd build/ meson compile meson install
Note that the user does not need to provide any additional flags; the OpenMP cuda module is installed by default.
Usage
Compilation
Compiling OpenMP applications with OMPi on a Jetson Nano board is a quite straightforward process. Simply run:
ompicc --devs=cuda app.c # or --devs=all
The compiler will produce the main application executable (a.out) and several kernel executables, specifically one for each OpenMP target construct existing in the application.
Testing the CUDA module
The correct installation of OMPi along with its cuda module, can be verified by running:
ompiconf --devvinfo
On a Jetson Nano 2GB, this command must print out the following information, assuming that cuda is the only module installed:
1 configured device module(s): cuda
MODULE [cuda]:
------
OMPi CUDA device module.
Available devices : 1
device id < 1 > {
GPU device name : NVIDIA Tegra X1
Compute capability : 5.3
CUDA toolkit version : 10.2
Num of multiprocessors : 1
Cores per multiprocessor : 128
Total num of cores : 128
Maximum thread block size : 1024
Global memory : 1.9 GBytes
Shared memory per block : 48 KBytes
}
------
The programmer can moreover verify that the module is working properly by compiling the following sample application and running it:
Mini-tutorial: Writing OpenMP applications which exploit the GPU
Targeting the Maxwell GPU of a Jetson Nano board is made possible by using the OpenMP target-related constructs. With these constructs, the programmer indirectly launches a CUDA grid, that consists of multiple CUDA blocks containing multiple CUDA threads. Below you can find information about the syntax and usage of the most popular constructs. For detailed information, please consult the official OpenMP specifications.