roberto.depietri:user:speed_testing
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
| Prossima revisione | Revisione precedente | ||
| roberto.depietri:user:speed_testing [16/12/2012 18:29] – creata roberto.depietri | roberto.depietri:user:speed_testing [15/01/2013 17:11] (versione attuale) – roberto.depietri | ||
|---|---|---|---|
| Linea 1: | Linea 1: | ||
| + | ======= Speed Testing ======= | ||
| + | Here I report the note on the attivity (and in this directory tree) all the result I will optain on characterizing the performace of the Einstein Toolkit on varoiu machine I do have access. This log book will cover all my activity | ||
| + | |||
| + | The main directory where I store the result of the Cactus Speed Test is "/ | ||
| + | |||
| + | |||
| + | ===== General Consideration ===== | ||
| + | |||
| + | I decided to consider the November 2012 verdion announced as follow: | ||
| + | We are pleased to announce the sixth release (code name " | ||
| + | community developed software infrastructure for relativistic astrophysics. | ||
| + | |||
| + | The main problems on previous test I did where strang scaling properties of Carpet | ||
| + | going to 256 or more processor and a lack of a proper log of the activity I did. | ||
| + | Thanks to Frank Loeffler I realized that the main scaling problem I observed | ||
| + | were due to CARPET IOASCII for 1d output. I pointed to me that all the processor | ||
| + | write in an order sequence to the 1d files and indeed the writing time scales | ||
| + | linearly with the number of MPI processes involved. Lesson lerned: do no output | ||
| + | in testing speed and scalig. Do separate IO testing and do not mix up the to | ||
| + | type of speed testing. | ||
| + | |||
| + | The good lesson I learned in previous test is the need to have standarzide | ||
| + | configuration to compare and use as reference. Alway do strong a week scaling | ||
| + | check. | ||
| + | |||
| + | ===== UNIGRID tests ===== | ||
| + | |||
| + | First check UNIGRID: | ||
| + | |||
| + | PUGH: | ||
| + | CARPET: CARPETit32.rpar generate par files like CARPETdx1.000it32.par | ||
| + | ################################################################################# | ||
| + | ### dx=[1.5 ....... 0.15]; nx=(60./dx *2 +1 +4); | ||
| + | ################################################################################## | ||
| + | ## 2.00 1.50 1.00 0.75 0.625 0.60 0.50 0.40 0.30 0.25 0.20 0.15 0.125 | ||
| + | ## 65 85 | ||
| + | ## 0.44 1.00 3.18 7.31 12.45 14.1 23.9 46.2 | ||
| + | ## | ||
| + | ## dx=1.0 Carpet requires 4312.518 MB | ||
| + | ## dx=1.5 Carpet requires 1356.006 MB | ||
| + | ## dx=2.0 Carpet requires | ||
| + | ################################################################################# | ||
| + | |||
| + | ===== CARPET tests ===== | ||
| + | |||
| + | Then check 3 refinement levels. Borders at 120 and subgrid at 60 and 30. | ||
| + | Also in this case we will do 32 integration steps on the finest grid. Resolution | ||
| + | dx will refer to the finer grid | ||
| + | |||
| + | CARPET: CARPET_RL3_it32.rpar generate par files like CARPET_RL3_dx1.000it32.par | ||
| + | ################################################################################# | ||
| + | ### dx=[1.5 ....... 0.15]; nx=(120./ | ||
| + | ################################################################################## | ||
| + | ## 2.00 1.50 1.00 0.75 0.625 0.60 0.50 0.40 0.30 0.25 0.20 0.15 0.125 | ||
| + | ## | ||
| + | ## 0.21 0.45 1.34 | ||
| + | ## | ||
| + | ## dx=0.75 Carpet requires 6468.078 MB | ||
| + | ## dx=1.0 Carpet requires 3318.366 MB | ||
| + | ## dx=1.5 Carpet requires 1471.679 MB | ||
| + | ## dx=2.0 Carpet requires 1068.414 MB | ||
| + | ## Total time for simulation | ||
| + | ## Su Blue Gene Q se perfect scaling will require (1024 cores) | ||
| + | ## 2.00 1.50 1.00 0.75 0.625 0.60 0.50 0.40 0.30 0.25 0.20 0.15 0.125 | ||
| + | ## 0.01 0.03 0.09 0.23 0.37 | ||
| + | ################################################################################# | ||
| + | |||
| + | |||
| + | |||
| + | ===== General problem with the testing ===== | ||
| + | |||
| + | |||
| + | First test had shown that the use of | ||
| + | |||
| + | ActiveThorns = " | ||
| + | TimerReport:: | ||
| + | TimerReport:: | ||
| + | TimerReport:: | ||
| + | TimerReport:: | ||
| + | TimerReport:: | ||
| + | |||
| + | deeply effect tests results. For example " | ||
| + | |||
| + | Blue Gene Size Np Nt Total time for simulation | ||
| + | With TimerReport | ||
| + | 64 128 | ||
| + | 64 256 | ||
| + | 64 512 | ||
| + | 64 1024 | ||
| + | 128 2048 | ||
| + | Without | ||
| + | 128 2048 | ||
| + | 256 4096 | ||
| + | |||
| + | All the speed tests will be performed without the activation of " | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ===== Second stage of Testing ===== | ||
| + | |||
| + | The second stage of testing involved just the output of various reduction of " | ||
| + | |||
| + | ^ Parfile: CARPET_RL3_dx(...)it32.par ^^^^^^^ | ||
| + | ^ dx ^ BG Size ^ # of cores ^ OMP size ^ simulation ^ CCT_EVOLV ^ WALL Time ^ | ||
| + | | 0.150 | 256| 4096| | ||
| + | ^ ^^^^^^^ | ||
| + | | 0.200 | 256| 4096| | ||
| + | | 0.200(*) | 128| 2048| | ||
| + | ^ ^^^^^^^ | ||
| + | | 0.250 | 256| 4096| | ||
| + | | 0.250 | 128| 2048| | ||
| + | | 0.250 | | ||
| + | ^ ^^^^^^^ | ||
| + | | 0.300 | 256| 4096| | ||
| + | | 0.300 | 128| 2048| | ||
| + | | 0.300 | | ||
| + | ^ ^^^^^^^ | ||
| + | | 0.400 | 256| 4096| | ||
| + | | 0.400 | 128| 2048| | ||
| + | | 0.400 | | ||
| + | ^ ^^^^^^^ | ||
| + | | 0.500 | 256| 4096| | ||
| + | | 0.500 | 128| 2048| | ||
| + | | 0.500 | | ||
| + | ^ OpenMP vs pure MPI ^^^^^^^ | ||
| + | | 0.250 | | ||
| + | | 0.250 | | ||
| + | | 0.250 | | ||
| + | | 0.250 | | ||
| + | | 0.250 | | ||
| + | ^ ^^^^^^^ | ||
| + | | 0.500 | | ||
| + | | 0.500 | | ||
| + | | 0.500 | | ||
| + | | 0.500 | | ||
| + | | 0.500 | | ||
| + | |||
| + | (*) This run was also performed doing as much as four time the number of time integration of it=128 | ||
| + | and the corresponding CCTK_EVOL changed from 510 to 2100 and simulation from 696 to 2524. | ||
| + | |||
| + | ===== Evaluation of the time to checkpoints ===== | ||
| + | |||
