Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
Entrambe le parti precedenti la revisione Revisione precedente Prossima revisione | Revisione precedente Prossima revisione Entrambe le parti successive la revisione | ||
roberto.alfieri:pub:vectorization [12/06/2017 16:37] roberto.alfieri |
roberto.alfieri:pub:vectorization [13/06/2017 20:15] roberto.alfieri |
||
---|---|---|---|
Linea 1: | Linea 1: | ||
====== Vectorization ====== | ====== Vectorization ====== | ||
+ | [[ https://hpc-forge.cineca.it/files/CoursesDev/public/2016/Milan/Enabling_software_for_high_scalable_intel_arch/course_part1.pdf | Introduction to Intel scalable architectures ]] | ||
+ | |||
+ | |||
+ | Processor peak performance includes the speed-up provided by the vector instructions, | ||
+ | but in order to exploit it you need specific programming techniques. | ||
+ | |||
+ | |||
+ | |||
+ | {{:roberto.alfieri:pub:vectortrend.png?200|}} | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | {{:roberto.alfieri:pub:waytovectorize.png?200|}} | ||
+ | |||
+ | |||
+ | |||
+ | Auto-vectorization is the easiest and more portable way to get vectorization. | ||
+ | |||
+ | The compiler recognize several vectiorization options. | ||
+ | |||
+ | Main vectorization options: | ||
+ | |||
+ | ^ ^ Intel compiler ^ | ||
+ | ^ KNL | -xMIC-AVX512 | | ||
+ | ^ BDW | -xCORE-AVX2 | | ||
+ | ^ Disable | -no-vec | | ||
+ | |||
+ | Not all loops can be vectorized: | ||
+ | |||
+ | Some examples: | ||
+ | |||
+ | * Loop with dependencies between iterations | ||
+ | |||
+ | <code> | ||
+ | for (i=1; i<MAX; i++) { | ||
+ | d[i] = e[i] – a[i-1]; | ||
+ | a[i] = b[i] + c[i]; | ||
+ | } | ||
+ | </code> | ||
+ | |||
+ | |||
+ | * Complex loops | ||
+ | |||
+ | * Function calls inside the loop: | ||
+ | |||
+ | for (int i = 0; i < N; i++) a[i] = foo(b[i]); | ||
+ | |||
+ | * Loops on data that are not aligned in memory | ||
+ | |