Preemption Scheduling

What is Preemption?

Preemption is a scheduling mechanism where some running jobs (preemptees) can be interrupted by other running jobs (preemptors). In our current implementation, a job may either be a preemptor (default), or a preemptee (optional), but not both.

When would I want to use preemption? When would I not want to use it?

When a job is designated as a preemptee, we increase the job's priority, and increase several limits, including the maximum number of running processors or jobs per user, and the maximum running time per job. Note that these increased limits only apply to the preemptable job. This allows preemptable jobs to potentially run on more resources, and for longer times, than normal jobs.

However, when a job is a preemptee, you are not guaranteed that you will get the full amount of walltime you request. A lot will depend on the overall workload of the scheduler, and how many jobs are designated as preemptees and how many are preemptors.

Is my preemptable job guaranteed any minimum amount of running time before it is preempted?

Currently there are no guarantees for minimum running time. Any time there is an eligible preemptor job, the preemptee job may be interrupted. We are evaluating this, and it may change in the future.

What happens to my job when it is preempted? Do I have any choices?

Currently, you can request one of two things to happen when your job is preempted:


Using this option, if your job is preempted it will be canceled, just as if you had used scancel to cancel it yourself. If you want to start the job again, you will need to submit it again.


This option allows you to have a job requeued when it is preempted, instead of being canceled. This means that the job will go back into the queue, and will be scheduled again. The job's walltime will start over, so if, for example, you requested 30 minutes of time, and the job was preempted after 10 minutes, the requeued job will have 30 minutes as its walltime, not 20 minutes.

In order to use the requeue option effectively, your job must be able to pick up where it left off, or at the very least, be able to deal with being interrupted and restarted.

How do I use Preemption?

With the current preemption model, all jobs are preemptors by default. If you want to use the preemption/cancel or preemption/requeue policies, follow these instructions:


To use this mode, you have to designate your job as a preemptee. To do this, you add an additional parameter when requesting resources. So, for example, if your original request syntax looked like this:

#SBATCH --nodes=4 --ntasks-per-node=8
#SBATCH --mem-per-cpu=2G
#SBATCH --time=10:00:00

You could use this syntax:

#SBATCH --nodes=4 --ntasks-per-node=8
#SBATCH --mem-per-cpu=2G
#SBATCH --time=10:00:00
#SBATCH --qos=standby

You can also simply add --qos=standby as a parameter to sbatch or salloc when you submit the job.


To use this mode, you must both designate your job as a preemptee, and as a restartable job. To do this, you add two additional parameters to your job's resource request syntax. The preemptee flag is exactly the same as used in the "Preempt/Cancel" above. The restartable flag can be added inside your job script like this:

#SBATCH --requeue

Or can be added using the --requeue parameter to sbatch or salloc when you submit the job. However, don't forget that you will also need the preemptable flag to make this work at all. Therefore, if your original job resource request looks like this:

#SBATCH --nodes=4 --ntasks-per-node=8
#SBATCH --mem-per-cpu=2G
#SBATCH --time=10:00:00

Then to use the Preempt/Requeue approach, you would use something like this:

#SBATCH --nodes=4 --ntasks-per-node=8
#SBATCH --mem-per-cpu=2G
#SBATCH --time=10:00:00
#SBATCH --qos=standby --requeue

How does preemption affect job priority?

In order to get the preemptable jobs started quickly, they are given a modest priority boost for being preemptable. Otherwise, the job priority is calculated in the same way it is for every other job. However, if you have a requeueable job, then the accrued queue time (the time the job has been waiting to run) is reset when the job is preempted, and the priority based on this queue time is calculated as if the job was newly submitted. For example, if you have a job that waited for 30 minutes, then ran 10 minutes, and then was preempted and requeued, the accrued queue time would be reset to 0, rather than representing 40 minutes since the job was originally submitted.