roberto.depietri:user:speed_testing
                Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
| Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente | ||
| roberto.depietri:user:speed_testing [16/12/2012 18:33] – roberto.depietri | roberto.depietri:user:speed_testing [15/01/2013 17:11] (versione attuale) – roberto.depietri | ||
|---|---|---|---|
| Linea 1: | Linea 1: | ||
| + | ======= Speed Testing ======= | ||
| + | Here I report the note on the attivity (and in this directory tree) all the  result I will optain on characterizing the performace of the Einstein Toolkit on varoiu machine I do have access. This log book will cover all my activity | ||
| + | |||
| + | The main directory where I store the result of the Cactus Speed Test is "/ | ||
| + | |||
| + | |||
| + | ===== General Consideration ===== | ||
| + | |||
| + | I decided to consider the November 2012 verdion announced as follow: | ||
| + | We are pleased to announce the sixth release (code name " | ||
| + | community developed software infrastructure for relativistic astrophysics. | ||
| + | |||
| + | The main problems on previous test I did where strang scaling properties of Carpet | ||
| + | going to 256 or more processor and a lack of a proper log of the activity I did. | ||
| + | Thanks to Frank Loeffler I realized that the main scaling problem I observed | ||
| + | were due to CARPET IOASCII for 1d output. I pointed to me that all the processor | ||
| + | write in an order sequence to the 1d files and indeed the writing time scales | ||
| + | linearly with the number of MPI processes involved. Lesson lerned: do no output | ||
| + | in testing speed and scalig. Do separate IO testing and do not mix up the to | ||
| + | type of speed testing. | ||
| + | |||
| + | The good lesson I learned in previous test is the need to have standarzide | ||
| + | configuration to compare and use as reference. Alway do strong a week scaling | ||
| + | check. | ||
| + | |||
| + | ===== UNIGRID tests ===== | ||
| + | |||
| + | First check UNIGRID: | ||
| + | |||
| + | PUGH: | ||
| + | CARPET: CARPETit32.rpar generate par files like CARPETdx1.000it32.par | ||
| + | ################################################################################# | ||
| + | ### dx=[1.5 ....... 0.15]; nx=(60./dx *2 +1 +4); | ||
| + | ################################################################################## | ||
| + | ## 2.00 1.50 1.00 0.75 0.625 0.60 0.50 0.40 0.30 0.25 0.20 0.15 0.125 | ||
| + | ##    65    85 | ||
| + | ##  0.44  1.00  3.18  7.31  12.45  14.1  23.9  46.2 | ||
| + | ## | ||
| + | ## dx=1.0 Carpet requires 4312.518 MB | ||
| + | ## dx=1.5 Carpet requires 1356.006 MB | ||
| + | ##  dx=2.0 Carpet requires | ||
| + | ################################################################################# | ||
| + | |||
| + | ===== CARPET tests ===== | ||
| + | |||
| + | Then check 3 refinement levels. Borders at 120 and subgrid at 60 and 30. | ||
| + | Also in this case we will do 32 integration steps on the finest grid. Resolution | ||
| + | dx will refer to the finer grid | ||
| + | |||
| + | CARPET: CARPET_RL3_it32.rpar generate par files like CARPET_RL3_dx1.000it32.par | ||
| + | ################################################################################# | ||
| + | ### dx=[1.5 ....... 0.15]; nx=(120./ | ||
| + | ################################################################################## | ||
| + | ## 2.00 1.50 1.00 0.75 0.625 0.60 0.50 0.40 0.30 0.25 0.20 0.15 0.125 | ||
| + | ## | ||
| + | ##  0.21  0.45  1.34 | ||
| + | ## | ||
| + | ## dx=0.75 Carpet requires 6468.078 MB | ||
| + | ## dx=1.0 Carpet requires 3318.366 MB | ||
| + | ## dx=1.5 Carpet requires 1471.679 MB | ||
| + | ## dx=2.0 Carpet requires 1068.414 MB | ||
| + | ##     Total time for simulation | ||
| + | ## Su Blue Gene Q se perfect scaling will require (1024 cores) | ||
| + | ## 2.00 1.50 1.00 0.75 0.625 0.60 0.50 0.40 0.30 0.25 0.20 0.15 0.125 | ||
| + | ##  0.01  0.03  0.09  0.23  0.37 | ||
| + | ################################################################################# | ||
| + | |||
| + | |||
| + | |||
| + | ===== General problem with the testing ===== | ||
| + | |||
| + | |||
| + | First test had shown that the use of | ||
| + | |||
| + | ActiveThorns = " | ||
| + | TimerReport:: | ||
| + | TimerReport:: | ||
| + | TimerReport:: | ||
| + | TimerReport:: | ||
| + | TimerReport:: | ||
| + | |||
| + | deeply effect tests results. For example " | ||
| + | |||
| + | Blue Gene Size Np Nt Total time for simulation | ||
| + | With TimerReport | ||
| + | 64  128 | ||
| + | 64  256 | ||
| + | 64  512 | ||
| + | 64 1024 | ||
| + | 128 2048 | ||
| + | Without | ||
| + | 128 2048 | ||
| + | 256 4096 | ||
| + | |||
| + | All the speed tests will be performed without the activation of " | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ===== Second stage of Testing ===== | ||
| + | |||
| + | The second stage of testing involved just the output of various reduction of " | ||
| + | |||
| + | ^ Parfile: CARPET_RL3_dx(...)it32.par ^^^^^^^ | ||
| + | ^ dx ^ BG Size ^ # of cores ^ OMP size ^ simulation ^ CCT_EVOLV ^ WALL Time ^ | ||
| + | | 0.150  |  256|  4096| | ||
| + | ^ ^^^^^^^ | ||
| + | | 0.200  |  256|  4096| | ||
| + | | 0.200(*) |  128|  2048| | ||
| + | ^ ^^^^^^^ | ||
| + | | 0.250  |  256|  4096| | ||
| + | | 0.250  |  128|  2048| | ||
| + | | 0.250  | | ||
| + | ^ ^^^^^^^ | ||
| + | | 0.300  |  256|  4096| | ||
| + | | 0.300  |  128|  2048| | ||
| + | | 0.300  | | ||
| + | ^ ^^^^^^^ | ||
| + | | 0.400  |  256|  4096| | ||
| + | | 0.400  |  128|  2048| | ||
| + | | 0.400  | | ||
| + | ^ ^^^^^^^ | ||
| + | | 0.500  |  256|  4096| | ||
| + | | 0.500  |  128|  2048| | ||
| + | | 0.500  | | ||
| + | ^ OpenMP vs pure MPI ^^^^^^^ | ||
| + | | 0.250  | | ||
| + | | 0.250  | | ||
| + | | 0.250  | | ||
| + | | 0.250  | | ||
| + | | 0.250  | | ||
| + | ^ ^^^^^^^ | ||
| + | | 0.500  | | ||
| + | | 0.500  | | ||
| + | | 0.500  | | ||
| + | | 0.500  | | ||
| + | | 0.500  | | ||
| + | |||
| + | (*) This run was also performed doing as much as four time the number of time integration of it=128 | ||
| + | and the corresponding CCTK_EVOL changed from 510 to 2100 and simulation from 696 to 2524. | ||
| + | |||
| + | ===== Evaluation of the time to checkpoints ===== | ||
| + | |||
