OpenMP adds constructs for shared-memory threading to C, C++ and Fortran
ForkâJoin Model:
Consists of Compiler directives, runtime routines and environment variables
Version 4.0 (July 2013), 4.5 (Nov 2015 ) and 5.0 (Nov 2018) add support for accelerators (target directives), vectorization (SIMD directives), thread affinity and cancellation.
https://www.openmp.org/resources/openmp-compilers-tools/
GCC: From GCC 6.1, OpenMP 4.5 is fully supported for C and C++
echo |cpp -fopenmp -dM |grep -i open #define _OPENMP 201511
Go to openMP Specifications to discover the mapping between the date provided and the actual OpenMP version number
Compile with -fopenmp on Linux
Begin execution as a single process (master thread)
Start of a parallel construct (using special directives): Master thread creates team of threads
Fork-join model of parallel execution
program ex1.c
#include <omp.h> #include <stdio.h> main() { int var1, var2, var3; //Serial code executed by master thread #pragma omp parallel private(var1, var2) shared(var3) //openMP directive { // Parallel section executed by all threads printf("hello from %d of %d\n", omp_get_thread_num(), omp_get_num_threads()); // omp_get_thread_num() and omp_get_num_threads() are openMP routines } // Resume serial code executed by master thread }
Run gcc with the -fopenmp flag:
gcc -O2 -fopenmp ex1.c
Beware: If you forget -fopenmp, then all OpenMP directives are ignored!
The default threads number is the number of processors of the node (see /proc/cpuinfo)
To control the number of threads used to run an OpenMP program, set the OMP_NUM_THREADS environment variable:
% ./a.out hello from 0 of 2 hello from 1 of 2 % env OMP_NUM_THREADS=3 ./a.out hello from 2 of 3 hello from 0 of 3 hello from 1 of 3
The threads number can be imposed with a openMP routine:
omp_set_num_threads(4);
Variables outside a parallel are shared, and variables inside a parallel are private (allocated in the thread stack)
Programmers can modify this default through the private() and shared() clauses:
Program ex2.c
#include <omp.h> #include <stdio.h> int main() { int t, j, i; #pragma omp parallel private(t, i) shared(j) { t = omp_get_thread_num(); printf("running %d\n", t); for (i = 0; i < 1000000; i++) j++; /* race! */ printf("ran %d\n", t); } printf("%d\n", j); }
gcc -O2 -fopenmp ex2.c
./a.out
It is the programmer's responsibility to ensure that multiple threads properly access SHARED variables (such as via CRITICAL sections)
Elapsed wall clock time can be taken using omp_get_wtime()
Program ex3.c
#include <omp.h> #include <iostream> #include <unistd.h> using namespace std; int main() { double t1,t2; cout << "Start timer" << endl; t1=omp_get_wtime(); // Do something long sleep(2); t2=omp_get_wtime(); cout << t2-t1 << endl; }
g++ ex3.c -fopenmp
./a.out
Main directives are of 2 types:
#pragma omp <directive-name> [clause, ..] { // parallelized region } //implicit synchronization
#pragma omp barrier //explicit synchronization
A parallel region is a block of code that will be executed by multiple threads. This is the fundamental OpenMP parallel construct.
Examples: ex1.c ex2.c parallel-single.c
The for workshare directive
#pragma omp parallel #pragma omp for for (i = 0; i < 5; i++) printf("hello from %d at %d\n", omp_get_thread_num(), i);
Or use #pragma omp parallel for
Examples: for.c for-schelule.c
The SINGLE directive specifies that the enclosed code is to be executed by only one thread in the team.
The MASTER directive specifies a region that is to be executed only by the master thread of the team. All other threads on the team skip this section of code.
Examples: parallel-single.c
A sections workshare directive is followed by a block that has section directives, one per task
#pragma omp parallel #pragma omp sections { #pragma omp section printf("Task A: %d\n", omp_get_thread_num()); #pragma omp section printf("Task B: %d\n", omp_get_thread_num()); #pragma omp section printf("Task C: %d\n", omp_get_thread_num()); }
There is an implied barrier at the end of a SECTIONS directive
Examples: sections.c
The CRITICAL directive specifies a region of code that must be executed by only one thread at a time.
Example:
#include <omp.h> main() { int x; x = 0; #pragma omp parallel shared(x) { #pragma omp critical x = x + 1; } /* end of parallel section */ }
Examples: ex2.c
The reduction clause of parallel
int t; #pragma omp parallel reduction(+:t) { t = omp_get_thread_num() + 1; printf("local %d\n", t); } printf("reduction %d\n", t);
Examples: reduction.c
int array[8] = { 1, 1, 1, 1, 1, 1, 1, 1}; int sum=0, i; #pragma omp parallel for reduction(+:sum) for (i = 0; i < 8; i++) { sum += array[i]; } printf("total %d\n", sum);
Using just
#pragma omp for
leaves the decision of data allocation up to the compiler
When you want to specify it yourself, use schedule:
#pragma omp for schedule(....)
Examples: for-schedule.c