28 July 2015

618. Modifying ECCE to work with slurm

UPDATE: ecce stops monitoring the job after 10-20 seconds. The job continues fine though. Working on fixing the monitoring issue. This message will be removed once that's fixed. It was due to $q needing to be lowercase (i.e. 'slurm', not 'Slurm') in eccejobmonitor.



Sun Gridengine has been removed from debian jessie (it's in wheezy and sid). This has given me a good excuse to explore setting up SLURM on my debian cluster. So I did: http://verahill.blogspot.com.au/2015/07/617-slurm-on-debian-jessie-and.html

My setup is very simple, with each node having it's own working directory that they export via NFS to the main node. Also, I never run jobs across several nodes. Because of that, each node has it's own queue. Not how beowulf clusters were supposed to work, but it's the best solution for me (e.g. ROCKS does the opposite -- exports the user dir from the main node, but that makes reading and writing slow where it counts i.e. on the work nodes).

I've currently got this slurm.conf:
ControlMachine=beryllium ControlAddr=192.168.1.1 MpiDefault=none ProctrackType=proctrack/pgid ReturnToService=2 SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid SlurmdSpoolDir=/var/lib/slurm/slurmd SlurmUser=slurm StateSaveLocation=/var/lib/slurm/slurmctld SwitchType=switch/none TaskPlugin=task/none FastSchedule=1 SchedulerType=sched/backfill SelectType=select/linear AccountingStorageType=accounting_storage/filetxt AccountingStorageLoc=/var/log/slurm/accounting ClusterName=rupert JobAcctGatherType=jobacct_gather/none SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdLogFile=/var/log/slurm/slurmd.log NodeName=beryllium NodeAddr=192.168.1.1 NodeName=neon NodeAddr=192.168.1.120 state=unknown NodeName=tantalum NodeAddr=192.168.1.150 state=unknown NodeName=magnesium NodeAddr=192.168.1.200 state=unknown NodeName=carbon NodeAddr=192.168.1.190 state=unknown NodeName=oxygen NodeAddr=192.168.1.180 state=unknown PartitionName=All Nodes=neon,beryllium,tantalum,oxygen,magnesium,carbon default=yes maxtime=infinite state=up PartitionName=mpi4 Nodes=tantalum maxtime=infinite state=up PartitionName=mpi12 Nodes=carbon maxtime=infinite state=up PartitionName=mpi8 Nodes=neon maxtime=infinite state=up PartitionName=mpix8 Nodes=oxygen maxtime=infinite state=up PartitionName=mpix12 Nodes=magnesium maxtime=infinite state=up PartitionName=mpi1 Nodes=beryllium maxtime=infinite state=up
and sinfo returns
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST All* up infinite 6 idle beryllium,carbon,magnesium,neon,oxygen,tantalum mpi4 up infinite 1 idle tantalum mpi12 up infinite 1 idle carbon mpi8 up infinite 1 idle neon mpix8 up infinite 1 idle oxygen mpix12 up infinite 1 idle magnesium mpi1 up infinite 1 idle beryllium

The first step was to figure out what files to edit:

grep -rs "qsub"
apps/siteconfig/QueueManagers:SGE|submitCommand: qsub ##script##
grep -rs "SGE"
apps/scripts/eccejobmonitor: &MsgSendUp("SGE job id '$id' in state '$state'"); [..] apps/siteconfig/Queues:magnesium|queueMgrName: SGE

Here are the files I edited:

apps/siteconfig/QueueManagers:
12 QueueManagers: LoadLeveler \ 13 Maui \ 14 EASY \ 15 PBS \ 16 LSF \ 17 Moab \ 18 SGE \ 19 Shell\ 20 Slurm
185 Shell|jobIdParseExpression: \ [0-9]+ 186 187 ############################################################################### 188 # SLURM 189 # Simple Linux Utility for Resource Management 190 # 191 # 192 Slurm|submitCommand: sbatch ##script## 193 Slurm|cancelCommand: scancel ##id## 194 Slurm|queryJobCommand: squeue 195 Slurm|queryMachineCommand: sinfo -p ##queue## 196 Slurm|queryQueueCommand: squeue -a 197 Slurm|queryDiskUsageCommand: df -k 198 Slurm|jobIdParseExpression: .* 199 Slurm|jobIdParseLeadingText: job
apps/scripts/eccejobmonitor:
2124 LogMsg "Globus status from eccejobstore: $state"; 2125 } 2126 elsif ($q eq 'slurm') 2127 { 2128 $cmd = "squeue 2>&1"; 2129 if (open(QUERY, "$cmd |")) 2130 { 2131 $gotState = 0; 2132 while () 2133 { 2134 LogMsg "JobCheck: Slurm qstat line: $_"; 2135 if (/^\s*$id/) 2136 { 2137 my $state = (split)[5]; 2138 2139 &MsgSendUp("Slurm job id '$id' in state '$state'"); 2140 2141 if (grep {$state eq $_} qw{R 2142 t}) 2143 { 2144 $status = $JOB_STATE_RUNNING; 2145 } 2146 elsif (grep {$state eq $_} qw{PD}) 2147 { 2148 $status = $JOB_STATE_PENDING; 2149 } 2150 $gotState = 1; 2151 last; 2152 } 2153 } 2154 if ($gotState == 0) 2155 { 2156 if ($gJobCheckState != $JOB_STATE_NONE) 2157 { 2158 $status = $JOB_STATE_DONE; 2159 } 2160 } 2161 close QUERY;

Next set up a new machine (or queue) using ecce -admin. Set up a queue -- you won't be able to select Slurm, so select e.g. PBS. Edit the apps/siteconfig/CONFIG.machinename file to e.g.
1 NWChem: /opt/nwchem/Nwchem/bin/LINUX64/nwchem 2 Gaussian-03: /opt/gaussian/g09d/g09/g09 3 perlPath: /usr/bin/ 4 qmgrPath: /usr/bin/ 5 xappsPath: /usr/bin/ 6 7 Slurm { 8 #!/bin/csh 9 #SBATCH -p mpi8 10 #SBATCH --time=$walltime 11 #SBATCH --output=slurm.out 12 #SBATCH --job-name=$submitFile 13 } 14 15 NWChemEnvironment { 16 PYTHONPATH /opt/nwchem/Nwchem/contrib/python 17 } 18 19 NWChemFilesToRemove{ core *.aoints.* *.gridpts.* } 20 21 NWChemCommand { 22 setenv PATH "/bin:/usr/bin:/sbin:/usr/sbin" 23 setenv LD_LIBRARY_PATH "/usr/lib/openmpi/lib:/opt/openblas/lib:/opt/acml/acml5.3.1/gfortran64_fma4_int64/lib:/opt/acml/acml5.3.1/gfortran64_int64/lib:/opt/intel/mkl/lib/intel64" 24 hostname 25 mpirun -n $totalprocs /opt/nwchem/Nwchem/bin/LINUX64/nwchem $infile > $outfile 26 } 27 28 Gaussian-03FilesToRemove{ core *.rwf } 29 30 Gaussian-03Command{ 31 set path = ( /opt/nbo6/bin $path ) 32 setenv GAUSS_SCRDIR /home/me/scratch 33 setenv GAUSS_EXEDIR /opt/gaussian/g09d/g09/bsd:/opt/gaussian/g09d/g09/local:/opt/gaussian/g09d/g09/extras:/opt/gaussian/g09d/g09 34 /opt/gaussian/g09d/g09/g09< $infile > $outfile 35 echo 0 36 } 37 38 Wrapup{ 39 dmesg|tail 40 find ~/scratch/* -name "*" -user me|xargs -I {} rm {} -rf 41 }
Next, edit apps/siteconfig/Queues -- in my case the machine I created is called neon-slurm:
neon-slurm|queueMgrName: Slurm neon-slurm|queueMgrVersion: 2.0~ neon-slurm|prefFile: neon-slurm.Q
And that's all. You should now be able to submit jobs via slurm. There's obviously a lot more than can be done and configured with SLURM, but this was enough to get me up and running, so that I'm now 'future-proofed' in case SGE never comes back into debian stable.

And here's what it looks like when my ecce-submitted jobs are running:

squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 30 mpi8 In_monom me PD 0:00 1 (Resources) 29 mpi8 b_monome me R 16:12 1 neon 31 mpix12 tl_dimer me R 34:28 1 magnesium

No comments:

Post a Comment