You will be requesting your code to run on a compute node through a Queue system, in this case the PBS (portable batch system). Our clusters have many queues available that differ based on the maximum time limit available to your code. The queue name has to be specified in the PBS script file, as shown on the Useful Scripts page. To find the list of queues available (and their details) on the clusters, use the following command:

qstat -q

  • Memory: Maximum memory available on the queue.
  • Walltime: Maximum time your code can run on the queue; it’s killed after this time. Ensure your code checkpoints regularly to prevent data loss.
  • Run, Que, and Lm: Show the current running jobs, queued jobs, and the maximum jobs allowed in the queue, respectively.

Note that private queues starting with “priv” may require specific access. For example, “privse” is for the School of Engineering. Contact the administrator for access to these queues.

 

 

Figure: A snapshot of qstat -q on the CPUHPC cluster

For more detailed queue configuration specifications, use qmgr -c "p s".

Please note that a low priority queue lowpr is installed to utilize the idle state of private nodes on an experimental basis. However, if owners of the private nodes are using their private queues, it may take a very long time for jobs to complete in this queue as your job will be suspended once they demand access. While jobs are submitted in the private queues, the jobs running in the private nodes in lowpr will go to a suspend state and will resume after the high priority jobs are completed. If, due to any reason, the working of private queues is affected by the jobs in lowpr, then it will be deleted without prior notice. Therefore, the submission of jobs to this queue is at your own risk.

See the Torque official documentation for further details.

See the qstat manual page (man qstat) and Torque official documentation for further details.

Figure: qstat -q run on the GPUHPC cluster