Skip to content

Starting and managing jobs with PBS

Batch processing is a method of running software programs called jobs in batches automatically. While users are required to submit the jobs, no other interaction by the user is required to process the batch. Batches may automatically be run at scheduled times as well as being run on the available computer resources. For additional background, see Batch Computing Overview. GSDC TEM computing cluster uses the Portable Batch System as implemented in Altair's PBS Pro across shared resources.

URL

PBS Pro (Community Edition)

PBS Pro (Commercial)


Job scripts

Job scripts form the basis of batch jobs. A job script is simply a text file with instructions of the work to execute. Job scripts are usually written in bash and thus mimic commands a user would execute interactively through a shell, but instead are executed on specific resources allocated by the scheduler when available. Scripts can also be written in other languages - commonly Python. See our job scripts page for a detailed discussion of job scripts and examples.

Submitting jobs

In the examples that follow, job.pbs, script_name etc. represent a job script files submitted for batch execution. PBS Pro can be used to schedule both interactive jobs and batch compute jobs.

To submit a batch job, use the qsub command followed by the name of your PBS batch script file.

$> qsub job.pbs

Propagating environment settings

Some users find it useful to set environment variables in their login environment that can be temporarily used for multiple batch jobs without modifying the job script. This practice can be particularly useful during iterative development and debugging work.

PBSPro has two approaches to propagation:

  1. Specific variables can be forwarded to the job upon request.
  2. The entire environment can be forwarded to the job.

In general, the first approach is preferred because the second may have unintended consequences.

These settings are controlled by qsub arguments that can be used at the command line or as directives within job scripts. Here are examples of both approaches:

# Selectively forward runtime variables to the job (lower-case v)
$> qsub -v DEBUG=true,CASE_NAME job.pbs

When you use the selective option (lower-case v), you can either specify only the variable name to propagate the current value (as in CASE_NAME in the example), or you can explicitly set it to a given value at submission time (as in DEBUG).

# Forward the entire environment to the job (upper-case V)
$> qsub -V job.pbs

Managing jobs

Here are some of the most useful commands for managing and monitoring jobs that have been launched with PBS. Most of these commands will only modify or query data from jobs that are active on the same system.

qdel

Canceling a single job

Run qdel with the job ID to kill a pending or running job.

$> qdel jobID

Stopping all of your own jobs

Kill all of your own pending or running jobs. (Be sure to use backticks as shown.)

$> qdel `qselect -u $USER`

qstat

Status of all your own jobs

Run this to see the status of all of your own unfinished jobs.

$> qstat -u $USER

Your output will be similar to what is shown just below. Most column headings are self-explanatory – NDS for nodes, TSK for tasks, and so on.

In the status (S) column, most jobs are either queued (Q) or running (R). Sometimes jobs are held (H), which might mean they are dependent on the completion of another job.

tem-ce-al9.sdfarm.kr:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
838.tem-ce-al9* USERID   cpuQ     cryosparc*  95965   1   1  8000m   --  R 00:01
840.tem-ce-al9* USERID   gpuQ     cryosparc*  84478   1   6   16gb   --  R 00:00
841.tem-ce-al9* USERID   gpuQ     cryosparc*  84594   1   6   16gb   --  R 00:00

Following are examples of qstat with some other commonly used options and arguments.

Status of an unfinished job

Get a long-form summary of the status of an unfinished job.

$> qstat -f jobID

Warning

Use the above command only sparingly; it places a high load on PBSPro.

Status of jobs within some periods

Get a single-line summary of the status of an unfinished or recently completed job (within 72 hours).

$> qstat -x jobID

Status of jobs on a specified queue

Get information about unfinished jobs in a specified execution queue.

$> qstat queue_name

Status of jobs by queue

See job activity by queue (e.g., pending, running) in terms of numbers of jobs.

$> qstat -Q

Status of all of your jobs

Display information for all of your pending, running, and finished jobs.

$> qstat -x -u $USER

Status of all your own jobs with comments

Display information for all of your unfinished jobs with exec_host and any scheduler_comment below the basic information.

$> qstat -n -s -u $USER

Status of all jobs

Display information for all the jobs (including other users jobs)

$> qstat -a

Interactive jobs

Interactive jobs provide an interactive session on a compute node, useful for debugging, testing code, and running short tasks that require user interaction.

Users can start an interactive job on GSDC TEM login nodes using the qsub -I command. The -I flag is used to request an interactive session. The following example shows how to start an interactive job with specified resources on cpuQ:

$> qsub -I -q cpuQ -l select=1:ncpus=4:mem=32GB -l walltime=01:00:00

The result for the above command is following:

qsub: waiting for job 850.tem-ce-al9.sdfarm.kr to start
qsub: job 850.tem-ce-al9.sdfarm.kr ready

[USERID@tem-cpu00-al9 ~]$ Do something

...

[USERID@tem-cpu00-al9 ~]$ exit
logout
qsub: job 850.tem-ce-al9.sdfarm.kr completed

Interactive jobs with GUI(X11)-based applications

User can also start an interactive job supporting GUI(X11)-based applications using qsub -X -V -I command.

The following example shows how to start an interactive job with specified resources on cpuQ, having environment variables and X11-forwarding attributes to be set:

$> qsub -X -V -I -q cpuQ -l select=1:ncpus=1:mem=16GB -l walltime=01:00:00

The result for the above command is following:

qsub: waiting for job 900.tem-ce-al9.sdfarm.kr to start
qsub: job 900.tem-ce-al9.sdfarm.kr ready

[USERID@tem-cpu00-al9 ~]$ echo $DISPLAY
localhost:50.0
...
[USERID@tem-cpu00-al9 ~]$ xclock
[USERID@tem-cpu00-al9 ~]$ exit
logout
qsub: job 900.tem-ce-al9.sdfarm.kr completed