We will first have a look at the various step we do have to do in order to have a running executable on a local machine and that we look on how to transpose the various steps on the TRAMONTANA grid cluster
The web page of the project is einsteintoolkit.org and the tutorial for new user of the Einstein Toolkit is located at the following web page TUTORIAL.
The main prerequisite for downloading the source code is that the revision control software tools cvs, svn and git should be installed on the main node
The first step is indeed to have the source of our source code available on our machine. We should first create a directory where to save our compilation tree:
mkdir EinsteinToolkit cd EinsteinToolkit/
We now get the download script and make it executable
wget --no-check-certificate https://github.com/gridaphobe/CRL/raw/ET_2011_05/GetComponents chmod 755 GetComponents
and we proceed on getting all the components
./GetComponents -a http://svn.einsteintoolkit.org/manifest/branches/ET_2011_05/einsteintoolkit.th
At this point we have a complete source tree ready to be compiled. To appreciate the size of the problem we have (to compile and run our program) we note that the size of the source tree is 567MBytes and contains 55451 files.
In fact we give the commands
tar czf Cactus_tree.tgz Cactus/ du -hs Cactus*
we get the following output:
567M Cactus 145M Cactus_tree.tgz
make Einstein-config F77=gfortran F90=gfortran GSL=yes JPEG=yes SSL=yes \ MPI=OpenMPI OPENMPI_DIR=/usr/lib64/openmpi/1.4-gcc/ \ OPENMP=yes
and it is required to answer "yes" to allow for a creation of a new executable configuration.
The next step is to do an actual compilation to a running executable out of our source tree. The way in which a Cactus executable is created of a source tree is selecting components providing a "Thornlist" file. Another possibility is editing the list of all the components present in the source tree or accepting the complete list of available components. The latter is achived running the command:
make Einstein
and answer "no". In this way a full compilation off all the available modules is obtained.
At the end we obtain a file called "cactus_Einstein
" in the exe subdirectory. Ready to be
lunched.
We should note that the only real configuration needed on a Linux machine was to provide the flavor of the MPI implementation available and directory where it is located. In the case of the grid-ui user interface present here in Parma that is the same that it is available on TRAMONTANA.
The executable is quite easy to use since it accept a parameter file and has MPI compiled and linked. To run the simulation one typically gives the command
mpirun -np 4 -x OMP_NUM_THREADS=1 Einstein static_tov.par
and an example of a parameter file is:
## mclachlan tov_static ActiveThorns = "Time MoL" ActiveThorns = "Coordbase CartGrid3d Boundary StaticConformal" ActiveThorns = "SymBase ADMBase TmunuBase HydroBase InitBase ADMCoupling ADMMacros" ActiveThorns = "IOUtil Formaline" ActiveThorns = "SpaceMask CoordGauge Constants LocalReduce aeilocalinterp LoopControl" ActiveThorns = "Carpet CarpetLib CarpetReduce CarpetRegrid2 CarpetInterp" ActiveThorns = "CarpetIOASCII CarpetIOScalar CarpetIOHDF5 CarpetIOBasic" ActiveThorns = "ADMConstraints NaNChecker" ActiveThorns = "TerminationTrigger" # grid parameters CartGrid3D::type = "coordbase" CartGrid3D::domain = "full" CartGrid3D::avoid_origin = "no" CoordBase::xmin = 0.0 CoordBase::ymin = 0.0 CoordBase::zmin = 0.0 CoordBase::xmax = 240.0 CoordBase::ymax = 240.0 CoordBase::zmax = 240.0 CoordBase::dx = 8 CoordBase::dy = 8 CoordBase::dz = 8 CoordBase::boundary_size_x_lower = 3 CoordBase::boundary_size_y_lower = 3 CoordBase::boundary_size_z_lower = 3 CoordBase::boundary_size_x_upper = 3 CoordBase::boundary_size_y_upper = 3 CoordBase::boundary_size_z_upper = 3 CoordBase::boundary_shiftout_x_lower = 1 CoordBase::boundary_shiftout_y_lower = 1 CoordBase::boundary_shiftout_z_lower = 1 CoordBase::boundary_shiftout_x_upper = 0 CoordBase::boundary_shiftout_y_upper = 0 CoordBase::boundary_shiftout_z_upper = 0 ActiveThorns = "ReflectionSymmetry" ReflectionSymmetry::reflection_x = "yes" ReflectionSymmetry::reflection_y = "yes" ReflectionSymmetry::reflection_z = "yes" ReflectionSymmetry::avoid_origin_x = "no" ReflectionSymmetry::avoid_origin_y = "no" ReflectionSymmetry::avoid_origin_z = "no" TmunuBase::stress_energy_storage = yes TmunuBase::stress_energy_at_RHS = yes TmunuBase::timelevels = 1 TmunuBase::prolongation_type = none HydroBase::timelevels = 3 ADMMacros::spatial_order = 4 ADMBase::metric_type = "physical" ADMConstraints::bound = "static" ADMConstraints::constraints_timelevels = 3 ADMConstraints::constraints_persist = yes SpaceMask::use_mask = "yes" Cactus::terminate = "time" Cactus::cctk_final_time = 1000 Carpet::domain_from_coordbase = "yes" Carpet::enable_all_storage = no Carpet::use_buffer_zones = "yes" Carpet::poison_new_timelevels = "yes" Carpet::check_for_poison = "no" Carpet::poison_value = 113 Carpet::init_3_timelevels = no Carpet::init_fill_timelevels = "yes" CarpetLib::poison_new_memory = "yes" CarpetLib::poison_value = 114 # system specific Carpet paramters Carpet::max_refinement_levels = 10 driver::ghost_size = 3 Carpet::prolongation_order_space = 3 Carpet::prolongation_order_time = 2 CarpetRegrid2::regrid_every = 0 CarpetRegrid2::num_centres = 1 CarpetRegrid2::num_levels_1 = 5 CarpetRegrid2::radius_1[1] =120.0 CarpetRegrid2::radius_1[2] = 60.0 CarpetRegrid2::radius_1[3] = 30.0 CarpetRegrid2::radius_1[4] = 15.0 time::dtfac = 0.25 MoL::ODE_Method = "rk4" MoL::MoL_Intermediate_Steps = 4 MoL::MoL_Num_Scratch_Levels = 1 # check all physical variables for NaNs NaNChecker::check_every = 1 NaNChecker::action_if_found = "just warn" #"terminate", "just warn", "abort" NaNChecker::check_vars = "ADMBase::metric ADMBase::lapse ADMBase::shift HydroBase::rho HydroBase::eps HydroBase::press HydroBase::vel" ## Lapse Condition: \partial_t alpha = - alpha K ## Shift Condition: \partial_t beta^i = 0 # Hydro paramters ActiveThorns = "EOS_Omni" ActiveThorns = "GRHydro" HydroBase::evolution_method = "GRHydro" GRHydro::riemann_solver = "Marquina" GRHydro::GRHydro_eos_type = "Polytype" GRHydro::GRHydro_eos_table = "2D_Polytrope" GRHydro::recon_method = "ppm" GRHydro::GRHydro_stencil = 3 GRHydro::bound = "none" GRHydro::rho_abs_min = 1.e-10 #GRHydro::GRHydro = 18 # Tmunu(10), rho,press,eps,w_lorentz,vel, tau #GRHydro::GRHydro = 10 # gij(6), alpha, beta(3) ActiveThorns = "GenericFD NewRad" ActiveThorns = "ML_BSSN ML_BSSN_Helper" ADMBase::evolution_method = "ML_BSSN" ADMBase::lapse_evolution_method = "ML_BSSN" ADMBase::shift_evolution_method = "ML_BSSN" ADMBase::dtlapse_evolution_method= "ML_BSSN" ADMBase::dtshift_evolution_method= "ML_BSSN" TmunuBase::support_old_CalcTmunu_mechanism = "no" ML_BSSN::timelevels = 3 ML_BSSN::harmonicN = 1 # 1+log ML_BSSN::harmonicF = 1.0 # 1+log ML_BSSN::LapseACoeff = 1.0 ML_BSSN::ShiftBCoeff = 1.0 ML_BSSN::ShiftGammaCoeff = 0.0 ML_BSSN::AlphaDriver = 0.0 ML_BSSN::BetaDriver = 0.0 ML_BSSN::LapseAdvectionCoeff = 0.0 ML_BSSN::ShiftAdvectionCoeff = 0.0 ML_BSSN::MinimumLapse = 1.0e-8 ML_BSSN::my_initial_boundary_condition = "extrapolate-gammas" ML_BSSN::my_rhs_boundary_condition = "NewRad" ML_BSSN::ML_log_confac_bound = "none" ML_BSSN::ML_metric_bound = "none" ML_BSSN::ML_Gamma_bound = "none" ML_BSSN::ML_trace_curv_bound = "none" ML_BSSN::ML_curv_bound = "none" ML_BSSN::ML_lapse_bound = "none" ML_BSSN::ML_dtlapse_bound = "none" ML_BSSN::ML_shift_bound = "none" ML_BSSN::ML_dtshift_bound = "none" # init parameters InitBase::initial_data_setup_method = "init_some_levels" ActiveThorns = "TOVSolver" ADMBase::initial_data = "tov" ADMBase::initial_lapse = "tov" ADMBase::initial_shift = "tov" ADMBase::initial_dtlapse = "zero" ADMBase::initial_dtshift = "zero" TOVSolver::TOV_Rho_Central[0] = 1.28e-3 TOVSolver::TOV_Gamma[0] = 2.0 TOVSolver::TOV_K[0] = 100.0 IOBasic::outInfo_every = 1 IOBasic::outInfo_vars = "HydroBase::rho ADMBase::lapse" IO::out_dir = $parfile IOScalar::outScalar_every = 32 IOScalar::one_file_per_group = yes IOScalar::outScalar_vars = " HydroBase::rho HydroBase::press HydroBase::eps HydroBase::vel ADMBase::lapse ADMBase::metric ADMBase::curv ADMConstraints::ham ADMConstraints::momentum " IOASCII::out1D_every = 128 IOASCII::one_file_per_group = yes IOASCII::output_symmetry_points = no IOASCII::out3D_ghosts = no IOASCII::out3D_outer_ghosts = no IOASCII::out1D_vars = " HydroBase::rho HydroBase::press HydroBase::eps HydroBase::vel ADMBase::lapse ADMBase::metric ADMBase::curv ADMConstraints::ham ADMConstraints::momentum " iohdf5::out_every = 256 iohdf5::out_vars = " hydrobase::rho hydrobase::press hydrobase::eps hydrobase::vel ADMBase::lapse ADMBase::shift ADMBase::curv ADMBase::metric " IO::out_mode = "proc" IO::out_unchunked = "no" #================================== # Checkpoint parameters #================================== IO::checkpoint_dir = "CHECKPOINT" IO::recover_dir = "CHECKPOINT" IO::checkpoint_every = 512 IO::checkpoint_keep = 2 IO::recover = "autoprobe" IO::checkpoint_on_terminate = "yes" IO::recover_file = "checkpoint.chkpt" IOHDF5::checkpoint = "yes" IOHDF5::use_reflevels_from_checkpoint = "yes" #-------------------------------------------------- TerminationTrigger::on_remaining_walltime = 5 TerminationTrigger::max_walltime = 1 TerminationTrigger::create_termination_file = yes TerminationTrigger::termination_file = "cactus_terminate" TerminationTrigger::check_file_every = 1 TerminationTrigger::termination_from_file = yes
One of the nice features of this program it is automatically produce CheckPoint every so iteration. Moreover if it finds a CheckPoint automatically restart from it. Also check for walltime and stop accordingly.
This is a very different application with respect to the "CHROMA" lattice qcd one. The main difference is that this is a very data-intense application and parallelism is not used to speed up execution but to allow to simulate bigger problem. Where bigger should be indented using increased resolution. This is also an application that would require an huge amount of data do be saved and transfer once the run is finished. Moreover we will need to restart the simulation more than once in order to overcome the queue wall time limits. That means that we need to save checkpoints !
For the above reason the creation of "tar" file with all the data to be moved out the storage elements is not an effective approach.
The compilation procedure can be performed with this simple shell script
#!/bin/bash ################################################################### ## We can compile on a directory that would not be automatically ## DELETED once the Jobs end ################################################################### umask 007 echo "pwd $(pwd)" export BaseDir="$(cd ../../../thogea10 ; pwd)/CompileET" echo "BaseDir= $BaseDir" mkdir -p ${BaseDir} cd ${BaseDir} ## ---------------------------------------- ## First we get the tra file we previously ## saved on the GRID storage ## But we can also use direct FS access ## ---------------------------------------- export BASESRM=/gpfs/gpfshds/srm/theophys/IS_OG51/Parma/CorsoGrid tar xzvf ${BASESRM}/Cactus_tree.tgz cd Cactus echo "yes" | \ make Einstein-config F77=gfortran F90=gfortran GSL=yes JPEG=yes SSL=yes \ MPI=OpenMPI OPENMPI_DIR=/usr/lib64/openmpi/1.4-gcc/ \ OPENMP=yes echo "no" | \ make -j 8 Einstein lcg-cr -v --vo theophys -d srm://gridsrm.pi.infn.it/theophys/IS_OG51/Parma/CorsoGrid/Einstein.v2 \ -l lfn:/grid/theophys/IS_OG51/Parma/CorsoGrid/Einstein.v2 file://$(pwd)/exe/cactus_Einstein
where should be noted the "make -j 8 Einstein" that would allow us to use all the 8 core we allocated to compile. A lot of caution should used using this trick because "self made" make sometimes do not always explicitly encode in a proper way all the dependencies.
But first we may want to have a look to our typical script ….
!/bin/bash ### ----------------------------------------------------------------- ### ### These are the settings on Tramontana ### ### ----------------------------------------------------------------- export LOCALSRM=/gpfs/gpfshds/srm/theophys/IS_OG51/Parma export CATALOG=lfn:/grid/theophys/IS_OG51/Parma export SRM=srm://gridsrm.pi.infn.it/theophys/IS_OG51/Parma export EXECUTABLE=Whisky.exe ### ----------------------------------------------------------------- ### ### Setting to Run Whisky: ### ### We assume that the precompiled exe are already stored on ### Tramonta. The same is assumed for the parameters files ### ### We also needs an area where JOBS are executed and results saved ### ----------------------------------------------------------------- export DIRpar=${LOCALSRM}/CACTUS/par export EXEcommand="${EXECUTABLE} $2.par" #${DIRpar}/$2.par" export DIRcheckpoint=CHECKPOINT/$2 echo "pwd $(pwd)" export EXPmar11="$(cd ../../../thogea10 ; pwd)/EXPmar11" echo "EXPmar11= $EXPmar11" export CACTUSexe="${EXPmar11}/build/Cactus/exe/cactus_EXPmar11" export CACTUSexe=${LOCALSRM}/CACTUS/exe/WHISKYexp export LOCALexe=${CACTUSexe} export executionDIR="${EXPmar11}/run/$2/$1"; umask 007 echo "======================================" echo "** mkdir -p ${executionDIR}" echo "** cd ${executionDIR}" mkdir -p ${executionDIR} cd ${executionDIR} echo "======================================" echo "========= JOB executions ===============" echo "We will run the exe file: ${EXECUTABLE}" echo "Copied from: ${LOCALexe}" echo "Running command is: ${EXEcommand}" echo "Recover DIR: ${LOCALSRM}/${DIRcheckpoint} " echo "========================================" ln -s /gpfs/gpfshds/csn4home/thogea10/EXPmar11/run/$2/output-0000/CHECKPOINT CHECKPOINT_RECOVER echo "========================================" echo "====================================" sed "s/IO::recover_dir = \"CHECKPOINT\"/IO::recover_dir = \"CHECKPOINT_RECOVER\"/" ${DIRpar}/$2.par > $2.par cat $2.par echo "====================================" ## ------------------------------------ ## GET and create The MPI nodelist ## ------------------------------------ NODEFILE1=/tmp/hf1.$(date +"%m%d%y%H%M%S") NODEFILE2=/tmp/hf2.$(date +"%m%d%y%H%M%S") RANKFILE=/tmp/hf3.$(date +"%m%d%y%H%M%S") AWKcommand=/tmp/hf4.$(date +"%m%d%y%H%M%S") touch $NODEFILE1 touch $NODEFILE2 touch $AWKcommand # echo "{print \"rank \" i++ \"=\" \$1 \" slot=0-3\"}" >> $AWKcommand # echo "{print \"rank \" i++ \"=\" \$1 \" slot=4-7\"}" >> $AWKcommand ### ----------------------------------------------------------------- ### The following setting is for a run with one thread for processor ### Nt is the numer of threads for processor ### ----------------------------------------------------------------- NT=1 echo "{print \"rank \" i++ \"=\" \$1 \" slot=0\"}" >> $AWKcommand echo "{print \"rank \" i++ \"=\" \$1 \" slot=1\"}" >> $AWKcommand echo "{print \"rank \" i++ \"=\" \$1 \" slot=2\"}" >> $AWKcommand echo "{print \"rank \" i++ \"=\" \$1 \" slot=3\"}" >> $AWKcommand echo "{print \"rank \" i++ \"=\" \$1 \" slot=4\"}" >> $AWKcommand echo "{print \"rank \" i++ \"=\" \$1 \" slot=5\"}" >> $AWKcommand echo "{print \"rank \" i++ \"=\" \$1 \" slot=6\"}" >> $AWKcommand echo "{print \"rank \" i++ \"=\" \$1 \" slot=7\"}" >> $AWKcommand for host in $LSB_HOSTS; do echo $host >> $NODEFILE1; done; sort -u ${NODEFILE1} > ${NODEFILE2} awk -f $AWKcommand ${NODEFILE2} > ${RANKFILE} NCPU=$[`cat ${NODEFILE1} | wc --lines`] NNODES=$[`cat ${NODEFILE2} | wc --lines`] NP=$[`cat ${RANKFILE} | wc --lines`] MPIRUN="mpirun -np $NP -x OMP_NUM_THREADS=$NT -hostfile $NODEFILE2 --rankfile ${RANKFILE}" echo "========= MPI PARAMETER ================" echo "N cpus = ${NCPU}" echo "N nodes = ${NNODES}" echo "Np = ${NP}" echo "Nt = ${NT}" echo "mpirun = " $(which mpirun) echo "MPI --> ${MPIRUN}" echo "========================================" echo "** cat $RANKFILE" cat $RANKFILE echo "========================================" # echo "** cat $NODEFILE1" # cat $NODEFILE1 # echo "========================================" echo "** cat $NODEFILE2" cat $NODEFILE2 echo "========================================" # echo "** cat $AWKcommand" # cat $AWKcommand # echo "========================================" echo "** cleaning previous open-mpi leftover" echo "** mpirun --pernode --hostfile $NODEFILE2 orte-clean --verbose" mpirun --pernode --hostfile $NODEFILE2 orte-clean --verbose echo "========================================" echo "======================================" echo "I'm running here [$(pwd)]" | tee OUTPUT echo "======================================" ls -l pwd ### ----------------------------------------------------------------- ### First I copy the executable to be run on the working directory ### and change attribute in such a way that I can execute it ### ----------------------------------------------------------------- cp ${LOCALexe} ${EXECUTABLE} chmod +x ${EXECUTABLE} echo "**************************************************" echo "**(TIME) START: " $(date) echo "**************************************************" ### ----------------------------------------------------------------- ### HERE GOES THE ACTUAL EXECUTIONS COMMAND ### ----------------------------------------------------------------- runCMD="${MPIRUN} ${EXEcommand} 2>&1 | tee -a RunTrace.out" echo "----------------------------- Command ----------------" echo "-- executing: ${runCMD}" eval ${runCMD} echo "$runCMD = ${runCMD}" echo "----------------------------- Command END ------------" chmod -R g+rwX data chmod -R g+rwX CHECKPOINT echo "**************************************************" echo "**(TIME) STOP: " $(date) echo "**************************************************" ## ------------------------------------ # Save the result file from the WN to the SE ## ------------------------------------ rm ${EXECUTABLE} rm ${RANKFILE} rm ${NODEFILE1} rm ${NODEFILE2} rm ${AWKcommand}
And you may find it particularly involved. But that is becouse we like to share acces to more that one member of the group to the data and the ability to run various step of the simulation.
The mpi-start automatically takes care of mixed OMP/MPI parallelism and will greatly simplify MPI job submission.
RunET_PR.jdl
# RunET_PR.jdl JobType = "Normal"; CPUNumber = 1; Executable = "RunET_PR.sh"; Arguments = "static_tov NODE"; StdOutput = "std.out"; StdError = "std.err"; PerusalFileEnable = true; PerusalTimeInterval = 10; InputSandbox = {"RunET_PR.sh"}; OutputSandbox = {"std.out", "std.err"}; MyProxyServer = "myproxy.cnaf.infn.it"; Requirements=(other.GlueCEUniqueID=="cream-ce.pr.infn.it:8443/cream-pbs-parallel"); CeRequirements = "hostsmpsize==8 && wholenodes==\"true\" && hostnumber==1";
RunET_PR.sh
#!/bin/bash ################################################################### ## We can compile on a directory that would not be automatically ## DELETED once the Jobs end ################################################################### umask 007 echo "pwd $(pwd)" export BaseDir="$(cd ../../../theophys/IS_OG51 ; pwd)/RunET/$1/$2" echo "BaseDir= $BaseDir" mkdir -p ${BaseDir} cd ${BaseDir} ## ---------------------------------------- ## First we get the tra file we previously ## saved on the GRID storage ## But we can also use direct FS access ## ---------------------------------------- export LFN=lfn:/grid/theophys/IS_OG51/Parma/CorsoGrid lcg-cp -v --vo theophys ${LFN}/Einstein.v2.PR file://$(pwd)/Einstein lcg-cp -v --vo theophys ${LFN}/$1.par file://$(pwd)/$1.par chmod +x Einstein if [ "x${2}" = "xCORE" ] ;then echo CORE mpi-start -t openmpi -pcore -d MPI_USE_AFFINITY=1 -d MPI_USE_OMP=1 -vv -- ./Einstein $1.par else if [ "x${2}" = "xNODE" ]; then echo NODE mpi-start -t openmpi -pnode -d MPI_USE_AFFINITY=1 -d MPI_USE_OMP=1 -vv -- ./Einstein $1.par else if [ "x${2}" = "xSOCKET" ]; then echo SOCKET mpi-start -t openmpi -psocket -d MPI_USE_AFFINITY=1 -d MPI_USE_OMP=1 -vv -- ./Einstein $1.par else echo NONE ./Einstein $1.par fi fi fi
This allow to check which kind of parallelization should be preferred.
Np 1 Nt 8 N. iterations in 1 hour 11776 Np 2 Nt 4 N. iterations in 1 hour 16256 Np 8 Nt 1 N. iterations in 1 hour 10464
Doing the same at double resolution (the size of the problem is 8 times bigger since we are doing a 3D simulation)
8xSIZE Np 1 Nt 8 N. iterations in 1 hour 2624 8xSIZE Np 2 Nt 4 N. iterations in 1 hour 3072 8xSIZE Np 8 Nt 1 N. iterations in 1 hour 3072
The scaling is never perfect !