General Batch Information
Overview / Jobs
Like many high-performance computing systems, BYU's HPC systems are managed by a batch scheduling system. This means that in order to use the systems, users must encapsulate the workload into a non-interactive job and submit that job to the scheduling system. The job can specify a number of parameters to the scheduling system, including the following:
* required attribute
- Resources Needed
- Either number of nodes and processors per node, or total number of processors
- Memory/RAM needed*
- Expected running time (or "walltime")*
- Specific node features/attributes (GPUs, processor types, etc.)
- Local disk space needed
- Events to notify on (job abort, begin, end)
- Email address to send notifications
- Job Name and output file
Jobs are created by building a job script, usually written in bash, perl, or python, that specifies the necessary parameters described above, and then contains appropriate scripting to launch the program that will do the job's work.
The scheduling system keeps track of the current state of all resources and jobs and decides, based on conditions and policies we have configured, where and when to start jobs. Jobs are started in priority order until no further jobs are eligible to start or it runs out of appropriate resources. The factors that are included in the priority calculation are:
- Historical Usage Patterns (eg. how much has been recently used)
- Per user
- Per research group
- Total time queued
When the scheduling system chooses to start a job, it assigns it one or more nodes/processors, as requested by the job, and launches the provided job script on the first node in the list of those assigned. The responsibility of taking advantage of all the requested resources is left up to the job script. Please do not request more resources unless you know you can use them.
The output (both stdout and stderr) is sent to the file slurm-JOBID.out where JOBID is the unique job ID assigned when the job is submitted. The output file is created in the directory whence the job was submitted, and the output begins appearing as soon as the job starts running.