Memory Requests
Starting in September 2010, all jobs will require a memory request at submit time. This will allow greater flexibility for the scheduler to find a place for your job. For example, if job A is submitted, requiring 6 GB of memory but only needing 1 processor, it currently would need to request 3 processors in order to ensure it will have 6 GB of memory (2GB per processor). If job B is submitted, requiring 10 processors but only 1 GB per process, it currently could not run on the same m6 node as job A, as the total of 13 processors is greater than the number available. With memory requests the scheduler would see that with job A running, the node has 11 processors free and 18 GB of memory that is free. This is enough for job B to run alongside job A.
For those who don't know how much memory your jobs use, here is a simple way to find out.
Launch Test Job
Launch a job using the test qos.
Example
bash-3.2$ qsub -l qos=test mpijob
2487566.fslsched.fsl.byu.edu
Find out which nodes the job is running on.
Run `checkjob <jobid>`. See what nodes have been allocated to the job. They are listed in the form [-node-:-procs-] where -node- is the hostname of the node assigned and -procs- is the number of processors on that node. It may be in other similar forms, ex "m5-1-[1-4]*8" means nodes m5-1-1,m5-1-2,m5-1-3,m5-1-4 each have 8 processors allocated.
Example
bash-3.2$ checkjob 2487566
job 2487566
AName: Primes
State: Running
Creds: user:hamiltop group:hamiltop account:staff class:batch qos:test
WallTime: 00:00:03 of 1:00:00
SubmitTime: Tue Jul 6 11:41:31
(Time Queued Total: 00:00:03 Eligible: 00:00:03)
StartTime: Tue Jul 6 11:41:34
NodeMatchPolicy: EXACTNODE
Total Requested Tasks: 1
Req~[0] TaskCount: 1 Partition: base
Dedicated Resources Per Task: PROCS: 1 MEM: 2048M
Allocated Nodes:
![m5-1-1:1]
StartCount: 1
Flags: PREEMPTOR,FSVIOLATION
Attr: FSVIOLATION,checkpoint
StartPriority: 21725
Reservation '2487566' (-00:00:44 -> 00:59:16 Duration: 1:00:00)
SSH to an allocated node.
Log on to the first node in the Allocated Nodes List.
Example
bash-3.2$ ssh m5-1-1
hamiltop@m5-1-1's password:
Last login: Tue Jul 6 11:44:55 2010 from m6int02.fsl.byu.edu
Fulton Supercomputing Lab
System Administrators: Tom Raisor, Lloyd Brown, and Ryan Cox.
For support issues please email fslsupport@byu.edu
By using this system, you agree to abide by the Supercomputing usage
policy. See http://marylou.byu.edu/policy.php for details.
-bash-3.2$
Run top
Run the program top. The values under RES show the amount of physical memory (RAM) being used by the process. This should be the amount of pmem in your job request.
Example
top - 15:40:15 up 20 days, 22:21, 1 user, load average: 13.41, 11.47, 6.27
Tasks: 224 total, 11 running, 213 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.7%us, 93.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.1%hi, 5.2%si, 0.0%
Mem: 24679044k total, 11530628k used, 13148416k free, 355968k buffers
Swap: 4096564k total, 27844k used, 4068720k free, 10828984k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6465 hamiltop 25 0 153m 4076 2440 S 100.4 0.0 9:56.11 a.out
6458 hamiltop 25 0 153m 4072 2436 R 100.1 0.0 9:56.04 a.out
6461 hamiltop 25 0 153m 4080 2448 R 100.1 0.0 9:56.09 a.out
6462 hamiltop 25 0 153m 4072 2440 R 100.1 0.0 9:56.05 a.out
6463 hamiltop 25 0 153m 4076 2440 R 100.1 0.0 9:55.43 a.out
6464 hamiltop 25 0 153m 4080 2440 R 100.1 0.0 9:55.66 a.out
6466 hamiltop 25 0 153m 4072 2440 S 100.1 0.0 9:56.00 a.out
6459 hamiltop 25 0 153m 4076 2440 R 99.8 0.0 9:55.89 a.out
6460 hamiltop 25 0 153m 4072 2436 S 99.8 0.0 9:55.98 a.out
6467 hamiltop 25 0 153m 4080 2444 R 99.8 0.0 9:55.99 a.out
6468 hamiltop 25 0 153m 4072 2436 R 99.8 0.0 9:56.09 a.out
6457 hamiltop 25 0 154m 4676 2828 R 78.8 0.0 7:42.05 a.out
6456 hamiltop 15 0 56776 2384 1632 S 21.0 0.0 2:13.80 mpiexec
6852 hamiltop 15 0 12848 1196 820 R 0.3 0.0 0:00.04 top
1 root 15 0 10348 80 52 S 0.0 0.0 1:06.19 init
Check Job Statistics
After the job has completed, the runtime statistics are available on our website in the Account Manager under Job Stats (here). The memory usage is calculated on a per node basis, so it should be the sum of all the processes shown under your name in top.
Example
Job Stats
Submit job with memory request
Using the information obtained through job statistics and top, we can request the amount of memory needed for our job to run. We can either request the memory required per process (pmem) or the total memory the job will need (mem). Most often pmem will be easier to calculate. Once calculated you can submit the job using either pmem or mem. Memory can be requested in mb or gb.
Example
bash-3.2$ qsub -l pmem=5mb mpijob
2487667.fslsched.fsl.byu.edu
or
bash-3.2$ qsub -l mem=240mb mpijob
2487668.fslsched.fsl.byu.edu
Verify that job has memory request
Run checkjob on the new Job ID. Check to see that the proper resource request was made.
Example
bash-3.2$ checkjob 2487667
job 2487667
AName: Primes
State: Running
Creds: user:hamiltop group:hamiltop account:staff class:batch qos:default
WallTime: 00:00:54 of 1:00:00
SubmitTime: Tue Jul 6 16:43:48
(Time Queued Total: 00:00:51 Eligible: 00:00:51)
StartTime: Tue Jul 6 16:44:39
NodeMatchPolicy: EXACTNODE
Total Requested Tasks: 48
Req![0] TaskCount: 48 Partition: base
Memory >= 5M Disk >= 0 Swap >= 0
Dedicated Resources Per Task: PROCS: 1 MEM: 5M
Allocated Nodes:
![m6-9-14:12]![m6-18-13:12]![m6-18-16:12]![m6-19-2:12]
StartCount: 1
Flags: PREEMPTOR,FSVIOLATION
Attr: FSVIOLATION,checkpoint
StartPriority: 25345
Reservation '2487667' (-00:01:39 -> 00:58:21 Duration: 1:00:00)
Last changed on Mon Feb 23 13:06:14 2015