====== OpenMP ====== ====Tutorials, external links ==== [[https://hpc-tutorials.llnl.gov/openmp/| LLNL]] **[[http://didattica-linux.unipr.it/~alfieri/matdid/HPC/openmp/base/ | Exercises repository ]]** ==== What is openMP ==== [[http://openmp.org/ | OpenMP]] adds constructs for shared-memory threading to C, C++ and Fortran {{:roberto.alfieri:pub:shared_mem.png?200|}} Fork--Join Model: {{:roberto.alfieri:pub:fork_join2.gif?400|}} Consists of Compiler directives, runtime routines and environment variables Version 4.0 (July 2013), 4.5 (Nov 2015 ) and 5.0 (Nov 2018) add support for accelerators (target directives), vectorization (SIMD directives), thread affinity and cancellation. === openMP support in the C/C++ Compilers === https://www.openmp.org/resources/openmp-compilers-tools/ GCC: From GCC 6.1, OpenMP 4.5 is fully supported for C and C++ To know the openMP installed version : echo |cpp -fopenmp -dM |grep -i open #define _OPENMP 201511 Go to [[ http://www.openmp.org/specifications/| openMP Specifications]] to discover the mapping between the date provided and the actual OpenMP version number == How to compile with openMP library == Compile with -fopenmp on Linux ===== Execution model ===== Begin execution as a single process (master thread) Start of a parallel construct (using special directives): Master thread creates team of threads Fork-join model of parallel execution {{:roberto.alfieri:pub:omp_exec_model.png?200|}} === Execution Example === program ex1.c #include #include main() { int var1, var2, var3; //Serial code executed by master thread #pragma omp parallel private(var1, var2) shared(var3) //openMP directive { // Parallel section executed by all threads printf("hello from %d of %d\n", omp_get_thread_num(), omp_get_num_threads()); // omp_get_thread_num() and omp_get_num_threads() are openMP routines } // Resume serial code executed by master thread } Run gcc with the -fopenmp flag: gcc -O2 -fopenmp ex1.c Beware: If you forget -fopenmp, then all OpenMP directives are ignored! ==== How Many Threads? ==== The default threads number is the number of processors of the node (see /proc/cpuinfo) To control the number of threads used to run an OpenMP program, set the OMP_NUM_THREADS environment variable: % ./a.out hello from 0 of 2 hello from 1 of 2 % env OMP_NUM_THREADS=3 ./a.out hello from 2 of 3 hello from 0 of 3 hello from 1 of 3 The threads number can be imposed with a openMP routine: omp_set_num_threads(4); ==== OpenMP variables ==== Variables outside a parallel are shared, and variables inside a parallel are private (allocated in the thread stack) Programmers can modify this default through the private() and shared() clauses: Program ex2.c #include #include int main() { int t, j, i; #pragma omp parallel private(t, i) shared(j) { t = omp_get_thread_num(); printf("running %d\n", t); for (i = 0; i < 1000000; i++) j++; /* race! */ printf("ran %d\n", t); } printf("%d\n", j); } gcc -O2 -fopenmp ex2.c ./a.out It is the programmer's responsibility to ensure that multiple threads properly access SHARED variables (such as via CRITICAL sections) ==== OpenMP timing ==== Elapsed wall clock time can be taken using [[https://gcc.gnu.org/onlinedocs/libgomp/omp_005fget_005fwtime.html | omp_get_wtime() ]] Program ex3.c #include #include #include using namespace std; int main() { double t1,t2; cout << "Start timer" << endl; t1=omp_get_wtime(); // Do something long sleep(2); t2=omp_get_wtime(); cout << t2-t1 << endl; } g++ ex3.c -fopenmp ./a.out ===== OpenMP Directives ===== Main directives are of 2 types: * Fork: PARALLEL FOR SECTION SINGLE MASTER CRITICAL * BARRIER == Syntax == #pragma omp [clause, ..] { // parallelized region } //implicit synchronization #pragma omp barrier //explicit synchronization ==== Parallel directive ==== A parallel region is a block of code that will be executed by multiple threads. This is the fundamental OpenMP parallel construct. Examples: ex1.c ex2.c parallel-single.c ==== For directive ==== The for workshare directive * requires that the following statement is a for loop * makes the loop index private to each thread * runs a subset of iterations in each thread {{roberto.alfieri:pub:omp_for.png?150|}} #pragma omp parallel #pragma omp for for (i = 0; i < 5; i++) printf("hello from %d at %d\n", omp_get_thread_num(), i); Or use #pragma omp parallel for Examples: for.c for-schelule.c ==== Single and Master directives ==== The SINGLE directive specifies that the enclosed code is to be executed by only one thread in the team. {{roberto.alfieri:pub:omp_single.png?150|}} The MASTER directive specifies a region that is to be executed only by the master thread of the team. All other threads on the team skip this section of code. Examples: parallel-single.c ==== Sections Directive ==== A sections workshare directive is followed by a block that has section directives, one per task #pragma omp parallel #pragma omp sections { #pragma omp section printf("Task A: %d\n", omp_get_thread_num()); #pragma omp section printf("Task B: %d\n", omp_get_thread_num()); #pragma omp section printf("Task C: %d\n", omp_get_thread_num()); } There is an implied barrier at the end of a SECTIONS directive Examples: sections.c ==== Critical Directive ==== The CRITICAL directive specifies a region of code that must be executed by only one thread at a time. Example: #include main() { int x; x = 0; #pragma omp parallel shared(x) { #pragma omp critical x = x + 1; } /* end of parallel section */ } Examples: ex2.c ==== Main clauses ==== === Reduce Clause === The reduction clause of parallel * makes the specified variable private to each thread * combines private results on exit int t; #pragma omp parallel reduction(+:t) { t = omp_get_thread_num() + 1; printf("local %d\n", t); } printf("reduction %d\n", t); Examples: reduction.c == Combining FOR and reduce == int array[8] = { 1, 1, 1, 1, 1, 1, 1, 1}; int sum=0, i; #pragma omp parallel for reduction(+:sum) for (i = 0; i < 8; i++) { sum += array[i]; } printf("total %d\n", sum); === Schedule clause === Using just #pragma omp for leaves the decision of data allocation up to the compiler When you want to specify it yourself, use schedule: #pragma omp for schedule(....) Examples: for-schedule.c