BYU

Office of Research Computing

Memory Requests

Starting in September 2010, all jobs will require a memory request at submit time. This will allow greater flexibility for the scheduler to find a place for your job. For example, if job A is submitted, requiring 6 GB of memory but only needing 1 processor, it currently would need to request 3 processors in order to ensure it will have 6 GB of memory (2GB per processor). If job B is submitted, requiring 10 processors but only 1 GB per process, it currently could not run on the same m6 node as job A, as the total of 13 processors is greater than the number available. With memory requests the scheduler would see that with job A running, the node has 11 processors free and 18 GB of memory that is free. This is enough for job B to run alongside job A.

For those who don't know how much memory your jobs use, here is a simple way to find out.

Launch Test Job

Launch a job using the test qos.

Example


bash-3.2$   qsub -l qos=test mpijob
2487566.fslsched.fsl.byu.edu

Find out which nodes the job is running on.

Run `checkjob <jobid>`. See what nodes have been allocated to the job. They are listed in the form [-node-:-procs-] where -node- is the hostname of the node assigned and -procs- is the number of processors on that node. It may be in other similar forms, ex "m5-1-[1-4]*8" means nodes m5-1-1,m5-1-2,m5-1-3,m5-1-4 each have 8 processors allocated.

Example


bash-3.2$  checkjob 2487566
job 2487566

AName: Primes
State: Running 
Creds:  user:hamiltop  group:hamiltop  account:staff  class:batch  qos:test
WallTime:   00:00:03 of 1:00:00
SubmitTime: Tue Jul  6 11:41:31
  (Time Queued  Total: 00:00:03  Eligible: 00:00:03)

StartTime: Tue Jul  6 11:41:34
NodeMatchPolicy: EXACTNODE
Total Requested Tasks: 1

Req~[0]  TaskCount: 1  Partition: base  
Dedicated Resources Per Task: PROCS: 1  MEM: 2048M


Allocated Nodes:
![m5-1-1:1]



StartCount:     1
Flags:          PREEMPTOR,FSVIOLATION
Attr:           FSVIOLATION,checkpoint
StartPriority:  21725
Reservation '2487566' (-00:00:44 -> 00:59:16  Duration: 1:00:00)

SSH to an allocated node.

Log on to the first node in the Allocated Nodes List.

Example


bash-3.2$   ssh m5-1-1
hamiltop@m5-1-1's password: 
Last login: Tue Jul  6 11:44:55 2010 from m6int02.fsl.byu.edu
Fulton Supercomputing Lab

System Administrators: Tom Raisor, Lloyd Brown, and Ryan Cox.

For support issues please email fslsupport@byu.edu

By using this system, you agree to abide by the Supercomputing usage
policy.  See http://marylou.byu.edu/policy.php for details.
-bash-3.2$ 

Run top

Run the program top. The values under RES show the amount of physical memory (RAM) being used by the process. This should be the amount of pmem in your job request.

Example


top - 15:40:15 up 20 days, 22:21,  1 user,  load average: 13.41, 11.47, 6.27
Tasks: 224 total,  11 running, 213 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.7%us, 93.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.1%hi,  5.2%si,  0.0%
Mem:  24679044k total, 11530628k used, 13148416k free,   355968k buffers
Swap:  4096564k total,    27844k used,  4068720k free, 10828984k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND        
 6465 hamiltop  25   0  153m 4076 2440 S 100.4  0.0   9:56.11 a.out         
 6458 hamiltop  25   0  153m 4072 2436 R 100.1  0.0   9:56.04 a.out         
 6461 hamiltop  25   0  153m 4080 2448 R 100.1  0.0   9:56.09 a.out         
 6462 hamiltop  25   0  153m 4072 2440 R 100.1  0.0   9:56.05 a.out         
 6463 hamiltop  25   0  153m 4076 2440 R 100.1  0.0   9:55.43 a.out         
 6464 hamiltop  25   0  153m 4080 2440 R 100.1  0.0   9:55.66 a.out         
 6466 hamiltop  25   0  153m 4072 2440 S 100.1  0.0   9:56.00 a.out         
 6459 hamiltop  25   0  153m 4076 2440 R 99.8    0.0   9:55.89 a.out          
 6460 hamiltop  25   0  153m 4072 2436 S 99.8    0.0   9:55.98 a.out          
 6467 hamiltop  25   0  153m 4080 2444 R 99.8    0.0   9:55.99 a.out          
 6468 hamiltop  25   0  153m 4072 2436 R 99.8    0.0   9:56.09 a.out          
 6457 hamiltop  25   0  154m 4676 2828 R 78.8    0.0   7:42.05 a.out          
 6456 hamiltop  15   0 56776 2384 1632 S 21.0    0.0   2:13.80 mpiexec        
 6852 hamiltop  15   0 12848 1196  820  R   0.3    0.0   0:00.04 top            
      1 root         15   0 10348   80   52    S   0.0    0.0   1:06.19 init 

Check Job Statistics

After the job has completed, the runtime statistics are available on our website in the Account Manager under Job Stats (here). The memory usage is calculated on a per node basis, so it should be the sum of all the processes shown under your name in top.

Example

Job Stats

Submit job with memory request  

Using the information obtained through job statistics and top, we can request the amount of memory needed for our job to run. We can either request the memory required per process (pmem) or the total memory the job will need (mem). Most often pmem will be easier to calculate. Once calculated you can submit the job using either pmem or mem. Memory can be requested in mb or gb.

Example


bash-3.2$  qsub -l pmem=5mb mpijob 
2487667.fslsched.fsl.byu.edu
                
                or

bash-3.2$   qsub -l mem=240mb mpijob 
2487668.fslsched.fsl.byu.edu

Verify that job has memory request

Run checkjob on the new Job ID. Check to see that the proper resource request was made.

Example


bash-3.2$  checkjob 2487667
job 2487667

AName: Primes
State: Running 
Creds:  user:hamiltop  group:hamiltop  account:staff  class:batch  qos:default
WallTime:   00:00:54 of 1:00:00
SubmitTime: Tue Jul  6 16:43:48
  (Time Queued  Total: 00:00:51  Eligible: 00:00:51)

StartTime: Tue Jul  6 16:44:39
NodeMatchPolicy: EXACTNODE
Total Requested Tasks: 48

Req![0]  TaskCount: 48  Partition: base  
Memory >= 5M  Disk >= 0  Swap >= 0
Dedicated Resources Per Task: PROCS: 1  MEM: 5M

Allocated Nodes:
![m6-9-14:12]![m6-18-13:12]![m6-18-16:12]![m6-19-2:12]




StartCount:     1
Flags:          PREEMPTOR,FSVIOLATION
Attr:           FSVIOLATION,checkpoint
StartPriority:  25345
Reservation '2487667' (-00:01:39 -> 00:58:21  Duration: 1:00:00)