====== EINSTEIN TOOLKIT ======

We will first have a look at the various step we do have to do in order to
have a running executable on a local machine and that we look on how to
transpose the various steps on the TRAMONTANA grid cluster 


===== Basic downloads =====

The web page of the project is [[http://einsteintoolkit.org/|einsteintoolkit.org]] and 
the tutorial for new user of the //Einstein Toolkit// is located at the following web page  [[http://docs.einsteintoolkit.org/et-docs/Tutorial_for_New_Users|TUTORIAL]].

The main prerequisite for downloading the source code is that the revision control
software tools cvs, svn and git should be installed on the main node  

The first step is indeed to have the source of our source code available on our 
machine. We should first create a directory where to save our compilation tree:

  mkdir EinsteinToolkit
  cd EinsteinToolkit/

We now get the download script and make it executable

  wget --no-check-certificate https://github.com/gridaphobe/CRL/raw/ET_2011_05/GetComponents
  chmod 755 GetComponents

and we proceed on getting all the components

  ./GetComponents -a http://svn.einsteintoolkit.org/manifest/branches/ET_2011_05/einsteintoolkit.th

At this point we have a complete source tree ready to be compiled. To appreciate the size of
the problem we have (to compile and run our program) we note that the size of the source tree
is 567MBytes and contains 55451 files.

In fact we give the commands

   tar czf Cactus_tree.tgz Cactus/ 
   du -hs Cactus* 

we get the following output:

   567M    Cactus
   145M    Cactus_tree.tgz


===== COMPILING =====

  make Einstein-config F77=gfortran F90=gfortran GSL=yes JPEG=yes SSL=yes  \
                       MPI=OpenMPI OPENMPI_DIR=/usr/lib64/openmpi/1.4-gcc/ \
                       OPENMP=yes


and it is required to answer "//yes//" to allow for a creation of a new executable configuration.

The next step is to do an actual compilation to a running executable out of our source tree.
The way in which a Cactus executable is created of a source tree is selecting components
providing a "Thornlist" file. Another possibility is editing the list of all the components present
in the source tree or accepting the complete list of available components. The latter is achived
running the command:

  make Einstein

and answer "//no//". In this way a full compilation off all the available modules is obtained.

At the end we obtain a file called "//''cactus_Einstein''//" in the exe subdirectory. Ready to be
lunched.

We should note that the only real configuration needed on a Linux machine was to provide
the flavor of the MPI implementation available and directory where it is located. In the case of
the grid-ui user interface present here in Parma that is the same that it is available on TRAMONTANA.
 

===== EXECUTING =====


The executable is quite easy to use since it accept a parameter 
file and has MPI compiled and linked. To run the simulation
one typically gives the command

  mpirun -np 4 -x OMP_NUM_THREADS=1 Einstein static_tov.par

and an example of a parameter file is:

  ## mclachlan tov_static
  ActiveThorns = "Time MoL"
  ActiveThorns = "Coordbase CartGrid3d Boundary StaticConformal"
  ActiveThorns = "SymBase ADMBase TmunuBase HydroBase InitBase ADMCoupling ADMMacros"
  ActiveThorns = "IOUtil Formaline"
  ActiveThorns = "SpaceMask CoordGauge Constants LocalReduce aeilocalinterp LoopControl"
  ActiveThorns = "Carpet CarpetLib CarpetReduce CarpetRegrid2 CarpetInterp"
  ActiveThorns = "CarpetIOASCII CarpetIOScalar CarpetIOHDF5 CarpetIOBasic"
  ActiveThorns = "ADMConstraints NaNChecker"
  ActiveThorns = "TerminationTrigger"
  # grid parameters
  CartGrid3D::type         = "coordbase"
  CartGrid3D::domain       = "full"
  CartGrid3D::avoid_origin = "no"
  CoordBase::xmin =   0.0
  CoordBase::ymin =   0.0
  CoordBase::zmin =   0.0
  CoordBase::xmax = 240.0
  CoordBase::ymax = 240.0
  CoordBase::zmax = 240.0
  CoordBase::dx   =   8
  CoordBase::dy   =   8
  CoordBase::dz   =   8
  CoordBase::boundary_size_x_lower        = 3
  CoordBase::boundary_size_y_lower        = 3
  CoordBase::boundary_size_z_lower        = 3
  CoordBase::boundary_size_x_upper        = 3
  CoordBase::boundary_size_y_upper        = 3
  CoordBase::boundary_size_z_upper        = 3
  CoordBase::boundary_shiftout_x_lower    = 1
  CoordBase::boundary_shiftout_y_lower    = 1
  CoordBase::boundary_shiftout_z_lower    = 1
  CoordBase::boundary_shiftout_x_upper    = 0
  CoordBase::boundary_shiftout_y_upper    = 0
  CoordBase::boundary_shiftout_z_upper    = 0
  ActiveThorns = "ReflectionSymmetry"
  ReflectionSymmetry::reflection_x = "yes"
  ReflectionSymmetry::reflection_y = "yes"
  ReflectionSymmetry::reflection_z = "yes"
  ReflectionSymmetry::avoid_origin_x = "no"
  ReflectionSymmetry::avoid_origin_y = "no"
  ReflectionSymmetry::avoid_origin_z = "no"
  TmunuBase::stress_energy_storage = yes
  TmunuBase::stress_energy_at_RHS  = yes
  TmunuBase::timelevels            =  1
  TmunuBase::prolongation_type     = none
  HydroBase::timelevels            = 3
  ADMMacros::spatial_order = 4
  ADMBase::metric_type     = "physical"
  ADMConstraints::bound            = "static"
  ADMConstraints::constraints_timelevels = 3
  ADMConstraints::constraints_persist    = yes
  SpaceMask::use_mask      = "yes"
  Cactus::terminate           = "time"
  Cactus::cctk_final_time     = 1000
  Carpet::domain_from_coordbase = "yes"
  Carpet::enable_all_storage       = no
  Carpet::use_buffer_zones         = "yes"
  Carpet::poison_new_timelevels    = "yes"
  Carpet::check_for_poison         = "no"
  Carpet::poison_value             = 113
  Carpet::init_3_timelevels        = no
  Carpet::init_fill_timelevels     = "yes"
  CarpetLib::poison_new_memory = "yes"
  CarpetLib::poison_value      = 114
  # system specific Carpet paramters
  Carpet::max_refinement_levels    = 10
  driver::ghost_size               = 3
  Carpet::prolongation_order_space = 3
  Carpet::prolongation_order_time  = 2
  CarpetRegrid2::regrid_every = 0
  CarpetRegrid2::num_centres  = 1
  CarpetRegrid2::num_levels_1 = 5
  CarpetRegrid2::radius_1[1]  =120.0
  CarpetRegrid2::radius_1[2]  = 60.0
  CarpetRegrid2::radius_1[3]  = 30.0
  CarpetRegrid2::radius_1[4]  = 15.0
  time::dtfac = 0.25
  MoL::ODE_Method             = "rk4"
  MoL::MoL_Intermediate_Steps = 4
  MoL::MoL_Num_Scratch_Levels = 1
  # check all physical variables for NaNs
  NaNChecker::check_every = 1
  NaNChecker::action_if_found = "just warn" #"terminate", "just warn", "abort"
  NaNChecker::check_vars = "ADMBase::metric ADMBase::lapse ADMBase::shift HydroBase::rho HydroBase::eps HydroBase::press HydroBase::vel"
  ## Lapse Condition:  \partial_t alpha = - alpha K
  ## Shift Condition:  \partial_t beta^i = 0
  # Hydro paramters
  ActiveThorns = "EOS_Omni"
  ActiveThorns = "GRHydro"
  HydroBase::evolution_method      = "GRHydro"
  GRHydro::riemann_solver            = "Marquina"
  GRHydro::GRHydro_eos_type           = "Polytype"
  GRHydro::GRHydro_eos_table          = "2D_Polytrope"
  GRHydro::recon_method              = "ppm"
  GRHydro::GRHydro_stencil            = 3
  GRHydro::bound                     = "none"
  GRHydro::rho_abs_min               = 1.e-10
  #GRHydro::GRHydro = 18 # Tmunu(10), rho,press,eps,w_lorentz,vel, tau
  #GRHydro::GRHydro    = 10    # gij(6), alpha, beta(3)
  ActiveThorns = "GenericFD NewRad"
  ActiveThorns = "ML_BSSN ML_BSSN_Helper"
  ADMBase::evolution_method        = "ML_BSSN"
  ADMBase::lapse_evolution_method  = "ML_BSSN"
  ADMBase::shift_evolution_method  = "ML_BSSN"
  ADMBase::dtlapse_evolution_method= "ML_BSSN"
  ADMBase::dtshift_evolution_method= "ML_BSSN"
  TmunuBase::support_old_CalcTmunu_mechanism = "no"
  ML_BSSN::timelevels = 3
  ML_BSSN::harmonicN           = 1      # 1+log
  ML_BSSN::harmonicF           = 1.0    # 1+log
  ML_BSSN::LapseACoeff         = 1.0
  ML_BSSN::ShiftBCoeff         = 1.0
  ML_BSSN::ShiftGammaCoeff     = 0.0
  ML_BSSN::AlphaDriver         = 0.0
  ML_BSSN::BetaDriver          = 0.0
  ML_BSSN::LapseAdvectionCoeff = 0.0
  ML_BSSN::ShiftAdvectionCoeff = 0.0
  ML_BSSN::MinimumLapse        = 1.0e-8
  ML_BSSN::my_initial_boundary_condition = "extrapolate-gammas"
  ML_BSSN::my_rhs_boundary_condition     = "NewRad"
  ML_BSSN::ML_log_confac_bound = "none"
  ML_BSSN::ML_metric_bound     = "none"
  ML_BSSN::ML_Gamma_bound      = "none"
  ML_BSSN::ML_trace_curv_bound = "none"
  ML_BSSN::ML_curv_bound       = "none"
  ML_BSSN::ML_lapse_bound      = "none"
  ML_BSSN::ML_dtlapse_bound    = "none"
  ML_BSSN::ML_shift_bound      = "none"
  ML_BSSN::ML_dtshift_bound    = "none"
  # init parameters
  InitBase::initial_data_setup_method = "init_some_levels"
  ActiveThorns = "TOVSolver"
  ADMBase::initial_data            = "tov"
  ADMBase::initial_lapse           = "tov"
  ADMBase::initial_shift           = "tov"
  ADMBase::initial_dtlapse         = "zero"
  ADMBase::initial_dtshift         = "zero"
  TOVSolver::TOV_Rho_Central[0] = 1.28e-3
  TOVSolver::TOV_Gamma[0]       = 2.0
  TOVSolver::TOV_K[0]           = 100.0
  IOBasic::outInfo_every              = 1
  IOBasic::outInfo_vars               = "HydroBase::rho ADMBase::lapse"
  IO::out_dir      = $parfile
  IOScalar::outScalar_every = 32
  IOScalar::one_file_per_group = yes
  IOScalar::outScalar_vars  = "
   HydroBase::rho
   HydroBase::press
   HydroBase::eps
   HydroBase::vel
   ADMBase::lapse
   ADMBase::metric
   ADMBase::curv
   ADMConstraints::ham
   ADMConstraints::momentum
  "
  IOASCII::out1D_every     = 128
  IOASCII::one_file_per_group = yes
  IOASCII::output_symmetry_points = no
  IOASCII::out3D_ghosts           = no
  IOASCII::out3D_outer_ghosts     = no
  IOASCII::out1D_vars      = "
   HydroBase::rho
   HydroBase::press
   HydroBase::eps
   HydroBase::vel
   ADMBase::lapse
   ADMBase::metric
   ADMBase::curv
   ADMConstraints::ham
   ADMConstraints::momentum
  "
  iohdf5::out_every    = 256
  iohdf5::out_vars     = "
      hydrobase::rho
      hydrobase::press
      hydrobase::eps
      hydrobase::vel
      ADMBase::lapse
      ADMBase::shift
      ADMBase::curv
      ADMBase::metric
  "
  IO::out_mode         = "proc"
  IO::out_unchunked    = "no"
  #==================================
  # Checkpoint parameters
  #==================================
    IO::checkpoint_dir   = "CHECKPOINT"
  IO::recover_dir      = "CHECKPOINT"
  IO::checkpoint_every = 512
  IO::checkpoint_keep  = 2
  IO::recover          = "autoprobe"
  IO::checkpoint_on_terminate                = "yes"
  IO::recover_file                          = "checkpoint.chkpt"
  IOHDF5::checkpoint                        = "yes"
  IOHDF5::use_reflevels_from_checkpoint     = "yes"
  #--------------------------------------------------
  TerminationTrigger::on_remaining_walltime = 5
  TerminationTrigger::max_walltime = 1
  TerminationTrigger::create_termination_file = yes
  TerminationTrigger::termination_file = "cactus_terminate"
  TerminationTrigger::check_file_every = 1
  TerminationTrigger::termination_from_file = yes

One of the nice features of this program it is automatically
produce CheckPoint every so iteration. Moreover if it finds
a CheckPoint automatically restart from it. Also check for walltime
and stop accordingly.


====== EINSTEIN TOOLKIT on the GRID ======


This is a very different application with respect to the "CHROMA" lattice qcd one. 
The main difference is that this is a very data-intense application and parallelism
is not used to speed up execution but to allow to simulate bigger problem. Where bigger
should be indented using increased resolution. This is also an application that would 
require an huge amount of data do be saved and transfer once the run is finished.
Moreover we will need to restart the simulation more than once in order to overcome
the queue wall time limits. That means that we need to save checkpoints !

For the above reason the creation of "tar" file with all the data to be moved out
the storage elements is not an effective approach.  


===== Compiling EINSTEIN TOOLKIT on the GRID =====

The compilation procedure can be performed with this simple
shell script

  #!/bin/bash
  ###################################################################
  ##  We can compile on a directory that would not be automatically
  ##  DELETED once the Jobs end
  ###################################################################
  umask 007
  echo "pwd $(pwd)"
  export BaseDir="$(cd ../../../thogea10 ; pwd)/CompileET"
  echo "BaseDir= $BaseDir"
  mkdir -p ${BaseDir}
  cd ${BaseDir}
  ## ----------------------------------------
  ## First we get the tra file we previously
  ## saved on the GRID storage 
  ## But we can also use direct FS access
  ## ----------------------------------------
  export BASESRM=/gpfs/gpfshds/srm/theophys/IS_OG51/Parma/CorsoGrid
  tar xzvf ${BASESRM}/Cactus_tree.tgz  
  cd Cactus
  echo "yes" | \
  make Einstein-config F77=gfortran F90=gfortran GSL=yes JPEG=yes SSL=yes  \
                     MPI=OpenMPI OPENMPI_DIR=/usr/lib64/openmpi/1.4-gcc/ \
                     OPENMP=yes
  echo "no" | \
  make -j 8 Einstein
  lcg-cr -v --vo theophys -d srm://gridsrm.pi.infn.it/theophys/IS_OG51/Parma/CorsoGrid/Einstein.v2  \
    -l lfn:/grid/theophys/IS_OG51/Parma/CorsoGrid/Einstein.v2 file://$(pwd)/exe/cactus_Einstein
 

where should be noted the "make -j 8 Einstein" that would allow us to use all the 8 core we allocated 
to compile. A lot of caution should used using this trick because "self made" make sometimes do
not always explicitly encode in a proper way all the dependencies.

===== Executing the EINSTEIN TOOLKIT on the GRID =====

===== A typical script for General Relativistic Hydro Jobs =====

But first we may want to have a look to our typical 
script ....

  !/bin/bash
  ### -----------------------------------------------------------------
  ###
  ###  These are the settings on Tramontana
  ###
  ### -----------------------------------------------------------------
  export LOCALSRM=/gpfs/gpfshds/srm/theophys/IS_OG51/Parma
  export CATALOG=lfn:/grid/theophys/IS_OG51/Parma
  export SRM=srm://gridsrm.pi.infn.it/theophys/IS_OG51/Parma
  export EXECUTABLE=Whisky.exe
  ### -----------------------------------------------------------------
  ###
  ###  Setting to Run Whisky:
  ###
  ###  We assume that the precompiled exe are already stored on 
  ###  Tramonta. The same is assumed for the parameters files
  ###
  ###  We also needs an area where JOBS are executed and results saved
  ### -----------------------------------------------------------------
  export DIRpar=${LOCALSRM}/CACTUS/par
  export EXEcommand="${EXECUTABLE} $2.par" #${DIRpar}/$2.par"
  export DIRcheckpoint=CHECKPOINT/$2
  echo "pwd $(pwd)"
  export EXPmar11="$(cd ../../../thogea10 ; pwd)/EXPmar11"
  echo "EXPmar11= $EXPmar11"
  export CACTUSexe="${EXPmar11}/build/Cactus/exe/cactus_EXPmar11"
  export CACTUSexe=${LOCALSRM}/CACTUS/exe/WHISKYexp
  export LOCALexe=${CACTUSexe}
  export executionDIR="${EXPmar11}/run/$2/$1";
  umask 007
  echo "======================================"
  echo "** mkdir -p ${executionDIR}"
  echo "** cd ${executionDIR}"
  mkdir -p ${executionDIR}
  cd ${executionDIR}
  echo "======================================"
  echo "========= JOB executions ==============="
  echo "We will run the exe file: ${EXECUTABLE}"
  echo "Copied from: ${LOCALexe}" 
  echo "Running command is: ${EXEcommand}"
  echo "Recover DIR: ${LOCALSRM}/${DIRcheckpoint} "
  echo "========================================"
  ln -s /gpfs/gpfshds/csn4home/thogea10/EXPmar11/run/$2/output-0000/CHECKPOINT CHECKPOINT_RECOVER
  echo "========================================"
  echo "===================================="
  sed "s/IO::recover_dir      = \"CHECKPOINT\"/IO::recover_dir      = \"CHECKPOINT_RECOVER\"/" ${DIRpar}/$2.par > $2.par 
  cat $2.par
  echo "===================================="
  ## ------------------------------------
  ##  GET and create The MPI nodelist
  ## ------------------------------------
  NODEFILE1=/tmp/hf1.$(date +"%m%d%y%H%M%S")
  NODEFILE2=/tmp/hf2.$(date +"%m%d%y%H%M%S")
  RANKFILE=/tmp/hf3.$(date +"%m%d%y%H%M%S")
  AWKcommand=/tmp/hf4.$(date +"%m%d%y%H%M%S")
  touch $NODEFILE1
  touch $NODEFILE2
  touch $AWKcommand
  # echo "{print \"rank \" i++ \"=\" \$1 \" slot=0-3\"}"   >> $AWKcommand
  # echo "{print \"rank \" i++ \"=\" \$1 \" slot=4-7\"}"   >> $AWKcommand
  ### -----------------------------------------------------------------
  ###  The following setting is for a run with one thread for processor 
  ###  Nt is the numer of threads for processor
  ### -----------------------------------------------------------------
  NT=1
  echo "{print \"rank \" i++ \"=\" \$1 \" slot=0\"}"   >> $AWKcommand
  echo "{print \"rank \" i++ \"=\" \$1 \" slot=1\"}"   >> $AWKcommand
  echo "{print \"rank \" i++ \"=\" \$1 \" slot=2\"}"   >> $AWKcommand
  echo "{print \"rank \" i++ \"=\" \$1 \" slot=3\"}"   >> $AWKcommand
  echo "{print \"rank \" i++ \"=\" \$1 \" slot=4\"}"   >> $AWKcommand
  echo "{print \"rank \" i++ \"=\" \$1 \" slot=5\"}"   >> $AWKcommand
  echo "{print \"rank \" i++ \"=\" \$1 \" slot=6\"}"   >> $AWKcommand
  echo "{print \"rank \" i++ \"=\" \$1 \" slot=7\"}"   >> $AWKcommand
  for host in $LSB_HOSTS; do echo $host >> $NODEFILE1; done;
  sort -u  ${NODEFILE1} > ${NODEFILE2}
  awk -f $AWKcommand  ${NODEFILE2} > ${RANKFILE}
  NCPU=$[`cat ${NODEFILE1} | wc --lines`]
  NNODES=$[`cat ${NODEFILE2} | wc --lines`]
  NP=$[`cat ${RANKFILE} | wc --lines`]
  MPIRUN="mpirun -np $NP -x OMP_NUM_THREADS=$NT -hostfile $NODEFILE2 --rankfile ${RANKFILE}"
  echo "========= MPI PARAMETER ================"
  echo "N cpus    = ${NCPU}"
  echo "N nodes   = ${NNODES}"
  echo "Np  = ${NP}"
  echo "Nt  = ${NT}"
  echo "mpirun = " $(which mpirun)
  echo "MPI --> ${MPIRUN}"
  echo "========================================"
  echo "** cat  $RANKFILE"
  cat   $RANKFILE
  echo "========================================"
  # echo "** cat  $NODEFILE1"
  # cat   $NODEFILE1
  # echo "========================================"
  echo "** cat  $NODEFILE2"
  cat   $NODEFILE2
  echo "========================================"
  # echo "** cat  $AWKcommand"
  # cat   $AWKcommand
  # echo "========================================"
  echo "** cleaning previous open-mpi leftover"
  echo "** mpirun --pernode  --hostfile $NODEFILE2 orte-clean --verbose"
  mpirun --pernode  --hostfile $NODEFILE2 orte-clean --verbose
  echo "========================================"
  echo "======================================"
  echo "I'm running here [$(pwd)]" | tee OUTPUT
  echo "======================================"
  ls -l
  pwd
  ### -----------------------------------------------------------------
  ### First I copy the executable to be run on the working directory
  ### and change attribute in such a way that I can execute it
  ### -----------------------------------------------------------------
  cp ${LOCALexe} ${EXECUTABLE}
  chmod +x ${EXECUTABLE}
  echo "**************************************************"
  echo "**(TIME)  START: " $(date)
  echo "**************************************************"
  ### -----------------------------------------------------------------
  ### HERE GOES THE ACTUAL EXECUTIONS COMMAND 
  ### -----------------------------------------------------------------
  runCMD="${MPIRUN} ${EXEcommand} 2>&1 | tee -a RunTrace.out"
  echo "----------------------------- Command ----------------"
  echo "-- executing: ${runCMD}"
  eval ${runCMD}
  echo "$runCMD = ${runCMD}"
  echo "----------------------------- Command END ------------"
  chmod -R g+rwX data
  chmod -R g+rwX CHECKPOINT
  echo "**************************************************"
  echo "**(TIME)   STOP: " $(date)
  echo "**************************************************"
  ## ------------------------------------
  # Save the result file from the WN  to the SE
  ## ------------------------------------
  rm  ${EXECUTABLE}
  rm  ${RANKFILE}
  rm  ${NODEFILE1}
  rm  ${NODEFILE2}
  rm  ${AWKcommand}

And you may find it particularly involved. But that is becouse we like
to share acces to more that one member of the group to the data and
the ability to run various step of the simulation.


====== EINSTEIN TOOLKIT on the GRID (mpi-start)======

The mpi-start automatically takes care of mixed OMP/MPI parallelism
and will greatly simplify MPI job submission.    


===== Execution on the PARMA cluster (no INFINIBAND) =====

RunET_PR.jdl

  # RunET_PR.jdl
  JobType = "Normal";
  CPUNumber = 1;
  Executable    = "RunET_PR.sh";
  Arguments     = "static_tov NODE";
  StdOutput     = "std.out";
  StdError      = "std.err";
  PerusalFileEnable = true;
  PerusalTimeInterval = 10;
  InputSandbox  = {"RunET_PR.sh"}; 
  OutputSandbox = {"std.out", "std.err"};
  MyProxyServer = "myproxy.cnaf.infn.it";
  Requirements=(other.GlueCEUniqueID=="cream-ce.pr.infn.it:8443/cream-pbs-parallel");
  CeRequirements = "hostsmpsize==8 &&  wholenodes==\"true\"  && hostnumber==1";

RunET_PR.sh

  #!/bin/bash
  ###################################################################
  ##  We can compile on a directory that would not be automatically
  ##  DELETED once the Jobs end
  ###################################################################
  umask 007
  echo "pwd $(pwd)"
  export BaseDir="$(cd ../../../theophys/IS_OG51 ; pwd)/RunET/$1/$2"
  echo "BaseDir= $BaseDir"
  mkdir -p ${BaseDir}
  cd ${BaseDir}
  ## ----------------------------------------
  ## First we get the tra file we previously
  ## saved on the GRID storage 
  ## But we can also use direct FS access
  ## ----------------------------------------
  export LFN=lfn:/grid/theophys/IS_OG51/Parma/CorsoGrid
  lcg-cp -v --vo theophys ${LFN}/Einstein.v2.PR file://$(pwd)/Einstein
  lcg-cp -v --vo theophys ${LFN}/$1.par file://$(pwd)/$1.par
  chmod +x Einstein
  if [  "x${2}" = "xCORE" ] ;then 
    echo CORE  
    mpi-start -t openmpi -pcore -d MPI_USE_AFFINITY=1 -d MPI_USE_OMP=1  -vv -- ./Einstein $1.par 
  else if [ "x${2}" = "xNODE" ]; then  
    echo NODE
    mpi-start -t openmpi -pnode -d MPI_USE_AFFINITY=1 -d MPI_USE_OMP=1  -vv -- ./Einstein $1.par 
  else if [ "x${2}" = "xSOCKET" ]; then  
    echo SOCKET
    mpi-start -t openmpi -psocket -d MPI_USE_AFFINITY=1 -d MPI_USE_OMP=1  -vv -- ./Einstein $1.par 
  else
    echo NONE
    ./Einstein $1.par 
  fi
  fi
  fi

This allow to check which kind of parallelization should
be preferred.

  Np 1 Nt 8 N. iterations in 1 hour 11776
  Np 2 Nt 4 N. iterations in 1 hour 16256
  Np 8 Nt 1 N. iterations in 1 hour 10464

Doing the same at double resolution (the size of the problem is
8 times bigger since we are doing a 3D simulation) 

  8xSIZE Np 1 Nt 8 N. iterations in 1 hour 2624
  8xSIZE Np 2 Nt 4 N. iterations in 1 hour 3072
  8xSIZE Np 8 Nt 1 N. iterations in 1 hour 3072

The scaling is never perfect !