Scheduler Configuration

This page is very out of date and is kept here for historical reasons.

Each job run in the BYU Supercomputing facilities must be submitted to the scheduling and resource management software, which decides which order jobs should run, and which nodes to run them on. Each job can exist in one of three states, as shown below:

Job States

Running: Job is currently running
Eligible: Job in this state are being considered for scheduling. Usually this means that the job would run if there were resources available. When resources become available, jobs are scheduled in priority order.
Blocked: Jobs in this state are not considered for scheduling. Jobs are put in this state because they are being deliberately held, or because of some policy violation. A user can place a hold on their own jobs, for example. Some examples of policy violations that will hold jobs are listed below.

Credential Policies

A number of per-user policies are utilized to help balance the utilization of the system among users and research groups, by keeping them from overwhelming the scheduling system. If a user or group violates these policies, the remaining jobs will be placed in the blocked state. Each of these can have both soft and hard limits, in which the scheduler tries to assign resources to jobs that do not violate the soft limit first, and if resources are still available, it assigns them to jobs that violate the soft limit, but not the hard limit. These values are adjusted often, based on the usage in the queue, so there are no guarantees that the values listed below are the correct ones. The values listed were current at the time of this writing.

This page is very out of date and is kept here for historical reasons.

Per User Policies

Max Jobs Running: soft limit of 400, hard limit of 525
Max Processors Running: soft limit of 440, hard limit of 550
Max Jobs Eligible: hard limit of 768
Max Processors Eligible: hard limit of 1600

Per Research Group Policies

Max Processors Running: soft limit of 512, hard limit of 630

Per Job Policies

This page is very out of date and is kept here for historical reasons.

In addition to the other policies, each job is subject to the following limitations:

Max Total Running Time Requested: No job will be allowed to request more than 16 days of total running time. NOTE: Most high-performance computing facilities limit this to between 24 and 72 hours.
Max CPU Time Requested: CPU Time is the product of CPU count and total running time requested. Currently, this is the equivalent of 128 processors for 14 days, or 1792 processor-days. For example, a job could use 256 processors for 7 days, or 384 processors for 112 hours.

Priority Calculation

When jobs are in the eligible state, they are started in priority order, with the highest priority jobs starting first. This priority comes from several factors, as shown below. The calculations are somewhat complicated, but interested users can contact us for more information. Note that priority only matters in the eligible state.

This page is very out of date and is kept here for historical reasons.

Fairshare

Fairshare is utilized to adjust priority based on historical usage. This is done both on a per-user and per-research group basis. With Fairshare, each user or group is given a fairshare target based on the number of processor-seconds utilized. The priority is adjusted based on the difference between the target and the actual utilization. A user or group that is below the target will see an increase in priority. A user or group that is above the target will see a decrease in priority.

Wall Clock Accuracy

Each user has a historical record of his or her wallclock accuracy. For example, if you request a 10 hour job, but only use 4 hours of actual time, that corresponds to 40% accuracy for that job. Over time, a user's historical wallclock accuracy is kept, and the higher that number (meaning the more accurate), the greater the priority will be.

This page is very out of date and is kept here for historical reasons.

Processors per job

In an effort to encourage more multi-processor jobs, a priority increase is given relative to the number of processors in the job. The greater the number of processors requested in a job, the greater that job's priority.

This page is very out of date and is kept here for historical reasons.

Queue time

The amount of time that a job remains queued without running is used in the priority calculation so that the longer a job waits to run, the higher the priority will be. That way, all jobs in the queue that can be run, will eventually be run. No job will be starved completely.

Last changed on Fri Feb 23 08:30:47 2018