roberto.depietri:user:speed_testing
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
Entrambe le parti precedenti la revisioneRevisione precedente | |||
roberto.depietri:user:speed_testing [15/01/2013 16:59] – roberto.depietri | roberto.depietri:user:speed_testing [15/01/2013 17:11] (versione attuale) – roberto.depietri | ||
---|---|---|---|
Linea 1: | Linea 1: | ||
+ | ======= Speed Testing ======= | ||
+ | Here I report the note on the attivity (and in this directory tree) all the result I will optain on characterizing the performace of the Einstein Toolkit on varoiu machine I do have access. This log book will cover all my activity | ||
+ | |||
+ | The main directory where I store the result of the Cactus Speed Test is "/ | ||
+ | |||
+ | |||
+ | ===== General Consideration ===== | ||
+ | |||
+ | I decided to consider the November 2012 verdion announced as follow: | ||
+ | We are pleased to announce the sixth release (code name " | ||
+ | community developed software infrastructure for relativistic astrophysics. | ||
+ | |||
+ | The main problems on previous test I did where strang scaling properties of Carpet | ||
+ | going to 256 or more processor and a lack of a proper log of the activity I did. | ||
+ | Thanks to Frank Loeffler I realized that the main scaling problem I observed | ||
+ | were due to CARPET IOASCII for 1d output. I pointed to me that all the processor | ||
+ | write in an order sequence to the 1d files and indeed the writing time scales | ||
+ | linearly with the number of MPI processes involved. Lesson lerned: do no output | ||
+ | in testing speed and scalig. Do separate IO testing and do not mix up the to | ||
+ | type of speed testing. | ||
+ | |||
+ | The good lesson I learned in previous test is the need to have standarzide | ||
+ | configuration to compare and use as reference. Alway do strong a week scaling | ||
+ | check. | ||
+ | |||
+ | ===== UNIGRID tests ===== | ||
+ | |||
+ | First check UNIGRID: | ||
+ | |||
+ | PUGH: | ||
+ | CARPET: CARPETit32.rpar generate par files like CARPETdx1.000it32.par | ||
+ | ################################################################################# | ||
+ | ### dx=[1.5 ....... 0.15]; nx=(60./dx *2 +1 +4); | ||
+ | ################################################################################## | ||
+ | ## 2.00 1.50 1.00 0.75 0.625 0.60 0.50 0.40 0.30 0.25 0.20 0.15 0.125 | ||
+ | ## 65 85 | ||
+ | ## 0.44 1.00 3.18 7.31 12.45 14.1 23.9 46.2 | ||
+ | ## | ||
+ | ## dx=1.0 Carpet requires 4312.518 MB | ||
+ | ## dx=1.5 Carpet requires 1356.006 MB | ||
+ | ## dx=2.0 Carpet requires | ||
+ | ################################################################################# | ||
+ | |||
+ | ===== CARPET tests ===== | ||
+ | |||
+ | Then check 3 refinement levels. Borders at 120 and subgrid at 60 and 30. | ||
+ | Also in this case we will do 32 integration steps on the finest grid. Resolution | ||
+ | dx will refer to the finer grid | ||
+ | |||
+ | CARPET: CARPET_RL3_it32.rpar generate par files like CARPET_RL3_dx1.000it32.par | ||
+ | ################################################################################# | ||
+ | ### dx=[1.5 ....... 0.15]; nx=(120./ | ||
+ | ################################################################################## | ||
+ | ## 2.00 1.50 1.00 0.75 0.625 0.60 0.50 0.40 0.30 0.25 0.20 0.15 0.125 | ||
+ | ## | ||
+ | ## 0.21 0.45 1.34 | ||
+ | ## | ||
+ | ## dx=0.75 Carpet requires 6468.078 MB | ||
+ | ## dx=1.0 Carpet requires 3318.366 MB | ||
+ | ## dx=1.5 Carpet requires 1471.679 MB | ||
+ | ## dx=2.0 Carpet requires 1068.414 MB | ||
+ | ## Total time for simulation | ||
+ | ## Su Blue Gene Q se perfect scaling will require (1024 cores) | ||
+ | ## 2.00 1.50 1.00 0.75 0.625 0.60 0.50 0.40 0.30 0.25 0.20 0.15 0.125 | ||
+ | ## 0.01 0.03 0.09 0.23 0.37 | ||
+ | ################################################################################# | ||
+ | |||
+ | |||
+ | |||
+ | ===== General problem with the testing ===== | ||
+ | |||
+ | |||
+ | First test had shown that the use of | ||
+ | |||
+ | ActiveThorns = " | ||
+ | TimerReport:: | ||
+ | TimerReport:: | ||
+ | TimerReport:: | ||
+ | TimerReport:: | ||
+ | TimerReport:: | ||
+ | |||
+ | deeply effect tests results. For example " | ||
+ | |||
+ | Blue Gene Size Np Nt Total time for simulation | ||
+ | With TimerReport | ||
+ | 64 128 | ||
+ | 64 256 | ||
+ | 64 512 | ||
+ | 64 1024 | ||
+ | 128 2048 | ||
+ | Without | ||
+ | 128 2048 | ||
+ | 256 4096 | ||
+ | |||
+ | All the speed tests will be performed without the activation of " | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===== Second stage of Testing ===== | ||
+ | |||
+ | The second stage of testing involved just the output of various reduction of " | ||
+ | |||
+ | ^ Parfile: CARPET_RL3_dx(...)it32.par ^^^^^^^ | ||
+ | ^ dx ^ BG Size ^ # of cores ^ OMP size ^ simulation ^ CCT_EVOLV ^ WALL Time ^ | ||
+ | | 0.150 | 256| 4096| | ||
+ | ^ ^^^^^^^ | ||
+ | | 0.200 | 256| 4096| | ||
+ | | 0.200(*) | 128| 2048| | ||
+ | ^ ^^^^^^^ | ||
+ | | 0.250 | 256| 4096| | ||
+ | | 0.250 | 128| 2048| | ||
+ | | 0.250 | | ||
+ | ^ ^^^^^^^ | ||
+ | | 0.300 | 256| 4096| | ||
+ | | 0.300 | 128| 2048| | ||
+ | | 0.300 | | ||
+ | ^ ^^^^^^^ | ||
+ | | 0.400 | 256| 4096| | ||
+ | | 0.400 | 128| 2048| | ||
+ | | 0.400 | | ||
+ | ^ ^^^^^^^ | ||
+ | | 0.500 | 256| 4096| | ||
+ | | 0.500 | 128| 2048| | ||
+ | | 0.500 | | ||
+ | ^ OpenMP vs pure MPI ^^^^^^^ | ||
+ | | 0.250 | | ||
+ | | 0.250 | | ||
+ | | 0.250 | | ||
+ | | 0.250 | | ||
+ | | 0.250 | | ||
+ | ^ ^^^^^^^ | ||
+ | | 0.500 | | ||
+ | | 0.500 | | ||
+ | | 0.500 | | ||
+ | | 0.500 | | ||
+ | | 0.500 | | ||
+ | |||
+ | (*) This run was also performed doing as much as four time the number of time integration of it=128 | ||
+ | and the corresponding CCTK_EVOL changed from 510 to 2100 and simulation from 696 to 2524. | ||
+ | |||
+ | ===== Evaluation of the time to checkpoints ===== | ||
+ |