Feature Based Scheduling
Feature-based scheduling is a specific scheduling approach that allows for greater utilization and improved access to resources. It allows users to specify exactly what resources they need, allowing the scheduler software flexibility in determining the best resources to run on.
In the past, for example, users without any special resource requests were still required to choose a particular queue, which meant that the job was committed to a particular type of hardware, even when the user did not really care about the specific hardware. The queue structure creates hard boundaries between resource groups, and no crossing of boundaries was allowed. However, many times, the users simply didn't care what resources they ran on, as long as they ran. As a result, the feature-based scheduling approach allows the user to specify exactly which resources or features they need, and not specify anything they don't care about.
Similarly, users have only needed a few processors, but a great deal of memory. Since all scheduling was done by processors and time, the best we could recommend was to request more processors to get the associated memory. This is no longer the case. You can now request the amount of memory separate from the number of processors.
As a general rule, users should request the hardware or features that they need, and not request anything else. The more resources that are requested, the more restricted the scheduler has to be, and the longer it will take to schedule the job. Just for clarity, we'll say this again:
Request the fewest possible resources for your job, in order to minimize the waiting time for the job to start.
What features are available for scheduling, and how do I use them?
Features that can be used for scheduling fit generically into two major categories: Node-specific features, and Job-specific features, and each of these has its own syntax. Note that not all combinations of features are available at this time--for example requesting Omni-Path with Infiniband--since the lab does not currently have the specific hardware to fulfill that request.
Resources and features that are node-specific are listed as a part of the resource request. Most users are already familiar with this for the use of the rhel7 or Red Hat 7 request. You may add as many of the following features to your request as you wish.
|Feature name||Description||Syntax example|
|avx||Requests processors that support the avx instruction set.||-C avx|
|avx2||Requests processors that support the avx2 instruction set.||-C avx2|
|avx512||Requests processors that support the avx512 instruction set.||-C avx512|
|fma||Requests processors that support the fma instruction set.||-C fma|
|rhel7||Requests nodes that are running the Red Hat Enterprise 7 Operating System.||-C rhel7|
|skylake||Requests nodes that have Skylake processors.||-C skylake|
|knl||Requests nodes that have Knights Landing (Xeon Phi) processors.||-C knl|
|ib||Requests that the nodes have access to an Infiniband interconnect||-C ib|
|kepler||Requests nodes that have Kepler GPUs.||-C kepler|
|opa||Requests that the nodes have access to an Omni-Path interconnect||-C opa|
|pascal||Requests nodes that have Pascal GPUs.||-C pascal|
Example Usage of Node-specific features
There are several options available to request multiple features for a particular request. For example an ampersand (&) can be used to denote multiple features that are required to be used together. A bar symbol (|) can be used to denote features where either can be used. Note that because some symbols (eg. ! and |) have special meaning in the shell environment, they must be quoted as shown here to have the desired effect.
|Requests that the nodes have Skylake processors with infiniband||-C 'skylake&ib'|
|Requests nodes that have Kepler, or Pascal GPUs (doesn't matter which).||-C 'kepler|pascal'|
|Requests nodes that have Knights Landing processors, and do NOT have infiniband (ib)||-C 'knl&!ib'|
Job-specific features apply to the job as a whole, and are listed in the resource request:
|Feature name||Description||Syntax example|
|nodes||Requests a specific number of nodes||--nodes=4|
|ntasks||Requests a total number of processors||--ntasks=4|
|time||Requests a total amount of running time for the job||--time=03:00:00|
|mem||Requests a total amount of memory requested by all processes in the job||--mem=16G|
|mem-per-cpu||Requests a specific amount of memory per process||--mem-per-cpu=4G|
|qos||Requests a specific QOS, or quality of service. This is used primarily for preemption||--qos=standby|
The following examples combine several of the above requirements, and include explanation of the specifics.
Request 4 processes on 1 node, with Infiniband and Intel processors, for 4.5 hours:
--nodes=1 --ntasks=4 --mem-per-cpu=2G -C 'ib&intel' --time=4:30:00
Request 4 nodes with 8 AVX-capable processors each, with Infiniband and FMA, for 24 hours, on the preemptee QOS:
--nodes=4 --ntasks-per-node=8 --mem-per-cpu=2gb -C 'avx&ib&fma' --time=24:00:00 --qos=standby
What resource requests are required?
The purpose of the feature scheduling approach is to allow users to request only the resources or features they care about, and not specify the features they don't care about. However, we require that users specify at least walltime and memory, and recommend that they specify ntasks (number of CPUs) and nodes (number of nodes), as shown below. By default, one CPU on one node will be provided.
--nodes=1 --ntasks=8 --mem-per-cpu=2G --time=24:00:00 # 8 CPUs and 16 GB memory on one node for a day