===== MPI/theophys Project ====
==== WorkerNodes ====
== Software environment ==
* SL5.x x86_64
* openMPI >= 1.3 (1.4 would be better)
* MPICH2
* Gnu C, C++, Fortran (gfortran, g77 , g90??)
* Support for commercial compilers?
* Scientific libraries:
* openMP: multithread library
* HDF5 : Data storing and managing library.
* Blas : Basic Linear Algebra Subprograms
* lapack: Linear Algebra PACKage
* GSL : Gnu Scientific Library
* GMP: Gnu Multiprecision Library
* GLPK: Gnu Linear Programming Kit.
* Fftw3: Fast Fourier Transform Library
* Octave : high-level language for numerical computations
==Installation example==
yum install -y yum-conf-epel.noarch
yum install -y octave hdf5-devel glpk fftw3
yum install -y libgomp blas-devel gsl-devel gmp-devel
== theophys TAG==
The following is a possible TAG to be published by theophys compliant sites:
GlueHostApplicationSoftwareRunTimeEnvironment: VO-theophys-gcc41 ??
==== Cluster ====
== Published TAGs for MPI ==
Mpi-start is the way to start MPI jobs:
MPI-START
Al least openMPI should be installed:
MPI_OPENMPI
MPI_OPENMPI_VERSION="x.y.z"
Shared home is recommended, but file distribution is supported by MPI-start:
MPI_SHARED_HOME | MPI_NO_SHARED_HOME
Remote Start-up of MPI job can be achieved via password-less SSH:
MPI_SSH_HOST_BASED_AUTH
Infiniband is recommended, but Gbit (or 10Gb) Ethernet can be used:
MPI-Infiniband | MPI-Ethernet
==Open Issues==
* Is it possible to publish the actual number of free CPUs per queue?
* How is CpuNumber used in the match-making process?
At the moment CpuNumber is not used at all for match making.
Temporary solution in the JDL:
CPUNumber=n
other.GlueCEInfoTotalCPUs >= CPUNumber
==== JDL ====
==Typical parallel jdl==
JobType = "Normal" ;
CpuNumber = 8 ;
==multithread support ==
SMPGranularity = 8;
WholeNodes = True;
Multithread support is desirable and it should be integrated
in the middleware as soon as possible.
== Open Issues ==
* Is it possible to integrate Granularity/WholeNodes directly in InfnGrid?
CREAM and BLAH: see https://twiki.cern.ch/twiki/bin/view/EGEE/ParameterPassing ??
WMS: included in WMS 3.3
==== Parallel and sequential jobs ====
VOMS Roles can be used to limit the access to Parallel queues.
== Voms Role = "Parallel" ==
The [[https://voms.cnaf.infn.it:8443/voms/infngrid/SearchRoles.do | Role]] is assigned by the VO manager
and released by [[https://voms.cnaf.infn.it:8443/voms/theophys/Siblings.do |VOMS]] only on explicit request.
==Setup example==
site-info.def:
PARALLEL_GROUP_ENABLE="/infngrid/ROLE=parallel"
/opt/glite/yaim/defaults/ig-site.pre:
FQANVOVIEWS=yes
groups.conf:
"/infngrid/ROLE=parallel"::::
voms-proxy-init -voms infngrid:/infngrid/Role=parallel
voms-proxy-info -all
>....
>attribute : /infngrid/Role=parallel/Capability=NULL
>attribute : /infngrid/Role=NULL/Capability=NULL
>...
==== MPI multi-thread jobs ====
MPI and multi-thread programs can be combined to exploit
the upcoming multicore architectures.
The hybrid mutlithread/MPI programming leads to a request
of N CPUs with a smaller number of MPI processes (N/thread_num).
Actually this programming model is not supported in EGEE.
Possible solution:
modify the value type of WholeNodes from boolean to integer.
Example:
SMPGranularity = 8;
WholeNodes = 4;
This syntax would lead to
qsub -l nodes=4:ppn=GlueHostArchitectureSMPSize
where ppn is a number >=8.
WholeNodes value should be passed to mpi-start
as the number of MPI processes.
Mpi-start should be modified accordingly.
Mixed mpi/mutithread programs require thread safe
MPI implementations.
Thead safety can be easily verified:
MPI_Init_thread(&argc, &argv, 3, &prov);
printf("MPI_Init_thread provided:%d\n", prov);
The third parameter (number 3) means a request
of full thread safety support ( MPI_THREAD_MULTIPLE ).
If returned value for prov is 0 thread support is not provided
(MPI_THREAD_SINGLE).
==== Scheduling ====
==objectives==
* Minimize jobs starvation
* Maximize resources exploitation
==Possible scenario==
MPI sites with at least 2 queues sharing the same pool of WNs:
* **high priority parallel queue**
* accessible only with special Role (Role=parallel ?)
* **Low priority sequential queue**
* preemptable (renice or requeue ?)
* short WallClockTime (less than 6 hours?)
* accessible only with special Role (Role=short ?).
==== Revision history ====
* 20100225 - R. DePietri, F. DiRenzo - User's required libraries
* 20100210 - C. Aiftimiei, R.Alfieri, M.Bencivenni, T.Ferrari - First Version