Strumenti Utente

Strumenti Sito


roberto.alfieri:pub:mpigrid

MPI/theophys Project

WorkerNodes

Software environment
  • SL5.x x86_64
  • openMPI >= 1.3 (1.4 would be better)
  • MPICH2
  • Gnu C, C++, Fortran (gfortran, g77 , g90??)
  • Support for commercial compilers?
  • Scientific libraries:
    • openMP: multithread library
    • HDF5 : Data storing and managing library.
    • Blas : Basic Linear Algebra Subprograms
    • lapack: Linear Algebra PACKage
    • GSL : Gnu Scientific Library
    • GMP: Gnu Multiprecision Library
    • GLPK: Gnu Linear Programming Kit.
    • Fftw3: Fast Fourier Transform Library
    • Octave : high-level language for numerical computations
Installation example
 yum install -y yum-conf-epel.noarch
 yum install -y octave hdf5-devel glpk fftw3
 yum install -y libgomp blas-devel gsl-devel gmp-devel
theophys TAG

The following is a possible TAG to be published by theophys compliant sites:

GlueHostApplicationSoftwareRunTimeEnvironment: VO-theophys-gcc41 ??

Cluster

Published TAGs for MPI

Mpi-start is the way to start MPI jobs:

 MPI-START

Al least openMPI should be installed:

 MPI_OPENMPI
 MPI_OPENMPI_VERSION="x.y.z"

Shared home is recommended, but file distribution is supported by MPI-start:

 MPI_SHARED_HOME | MPI_NO_SHARED_HOME  

Remote Start-up of MPI job can be achieved via password-less SSH:

MPI_SSH_HOST_BASED_AUTH

Infiniband is recommended, but Gbit (or 10Gb) Ethernet can be used:

  MPI-Infiniband | MPI-Ethernet 
Open Issues
  • Is it possible to publish the actual number of free CPUs per queue?
  • How is CpuNumber used in the match-making process?
At the moment  CpuNumber is not used at all for match making.
Temporary solution in the JDL:
CPUNumber=n
other.GlueCEInfoTotalCPUs >= CPUNumber

JDL

Typical parallel jdl
JobType = "Normal" ;
CpuNumber = 8 ;
multithread support
SMPGranularity = 8;
WholeNodes = True;

Multithread support is desirable and it should be integrated in the middleware as soon as possible.

Open Issues
  • Is it possible to integrate Granularity/WholeNodes directly in InfnGrid?
CREAM and BLAH: see https://twiki.cern.ch/twiki/bin/view/EGEE/ParameterPassing ??
WMS: included in WMS 3.3

Parallel and sequential jobs

VOMS Roles can be used to limit the access to Parallel queues.

Voms Role = "Parallel"

The Role is assigned by the VO manager and released by VOMS only on explicit request.

Setup example
site-info.def:
PARALLEL_GROUP_ENABLE="/infngrid/ROLE=parallel"

/opt/glite/yaim/defaults/ig-site.pre:
FQANVOVIEWS=yes

groups.conf:
"/infngrid/ROLE=parallel":::: 

voms-proxy-init -voms infngrid:/infngrid/Role=parallel 
voms-proxy-info -all
>....
>attribute : /infngrid/Role=parallel/Capability=NULL
>attribute : /infngrid/Role=NULL/Capability=NULL 
>...

MPI multi-thread jobs

MPI and multi-thread programs can be combined to exploit the upcoming multicore architectures. The hybrid mutlithread/MPI programming leads to a request of N CPUs with a smaller number of MPI processes (N/thread_num). Actually this programming model is not supported in EGEE. Possible solution: modify the value type of WholeNodes from boolean to integer. Example:

SMPGranularity = 8;
WholeNodes = 4;

This syntax would lead to

qsub -l nodes=4:ppn=GlueHostArchitectureSMPSize

where ppn is a number >=8. WholeNodes value should be passed to mpi-start as the number of MPI processes. Mpi-start should be modified accordingly.

Mixed mpi/mutithread programs require thread safe MPI implementations. Thead safety can be easily verified:

MPI_Init_thread(&argc, &argv, 3, &prov); 
printf("MPI_Init_thread provided:%d\n", prov);

The third parameter (number 3) means a request of full thread safety support ( MPI_THREAD_MULTIPLE ). If returned value for prov is 0 thread support is not provided (MPI_THREAD_SINGLE).

Scheduling

objectives
  • Minimize jobs starvation
  • Maximize resources exploitation
Possible scenario

MPI sites with at least 2 queues sharing the same pool of WNs:

  • high priority parallel queue
    • accessible only with special Role (Role=parallel ?)
  • Low priority sequential queue
    • preemptable (renice or requeue ?)
    • short WallClockTime (less than 6 hours?)
    • accessible only with special Role (Role=short ?).

Revision history

  • 20100225 - R. DePietri, F. DiRenzo - User's required libraries
  • 20100210 - C. Aiftimiei, R.Alfieri, M.Bencivenni, T.Ferrari - First Version
roberto.alfieri/pub/mpigrid.txt · Ultima modifica: 26/02/2010 12:10 da roberto.alfieri