Personal tools
Platform LSF Version 6.0 - Running Jobs with Platform LSF - Viewing Information About Jobs
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Use the
bjobs
andbhist
commands to view information about jobs:
bjobs
reports the status of jobs and the various options allow you to display specific information.bhist
reports the history of one or more jobs in the system.You can also find jobs on specific queues or hosts, find jobs submitted by specific projects, and check the status of specific jobs using their job IDs or names.
- Viewing Job Information (bjobs)
- Viewing Job Pend and Suspend Reasons (bjobs -p)
- Viewing Detailed Job Information (bjobs -l)
- Viewing Job Resource Usage (bjobs -l)
- Viewing Job History (bhist)
- Viewing Job Output (bpeek)
- Viewing Information about SLAs and Service Classes
- Viewing Jobs in Job Groups
- Viewing Information about Resource Allocation Limits
[ Top ]
Viewing Job Information (bjobs)
The
bjobs
command has options to display the status of jobs in the LSF system. For more details on these or otherbjobs
options, see thebjobs
command in the Platform LSF Reference.Unfinished current jobs
The
bjobs
command reports the status of LSF jobs.When no options are specified,
bjobs
displays information about jobs in the PEND, RUN, USUSP, PSUSP, and SSUSP states for the current user.For example:
%bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 3926 user1 RUN priority hostf hostc verilog Oct 22 13:51 605 user1 SSUSP idle hostq hostc Test4 Oct 17 18:07 1480 user1 PEND priority hostd generator Oct 19 18:13 7678 user1 PEND priority hostd verilog Oct 28 13:08 7679 user1 PEND priority hosta coreHunter Oct 28 13:12 7680 user1 PEND priority hostb myjob Oct 28 13:17All jobs
bjobs
-a
displays the same information asbjobs
and in addition displays information about recently finished jobs (PEND, RUN, USUSP, PSUSP, SSUSP, DONE and EXIT statuses).All your jobs that are still in the system and jobs that have recently finished are displayed.
Running jobs
bjobs -r
displays information only for running jobs (RUN state).[ Top ]
Viewing Job Pend and Suspend Reasons (bjobs -p)
When you submit a job, it may be held in the queue before it starts running and it may be suspended while running. You can find out why jobs are pending or in suspension with the
bjobs
-p
option.You can combine
bjob
options to tailor the output. For more details on these or otherbjobs
options, see thebjobs
command in the Platform LSF Reference.
- Pending jobs and reasons
- Viewing pending and suspend reasons with host names
- Viewing suspend reasons only
Pending jobs and reasons
bjobs -p
displays information for pending jobs (PEND state) and their reasons. There can be more than one reason why the job is pending.For example:
%bjobs -p
JOBID USER STAT QUEUE FROM_HOST JOB_NAME SUBMIT_TIME 7678 user1 PEND priority hostD verilog Oct 28 13:08 Queue's resource requirements not satisfied:3 hosts; Unable to reach slave lsbatch server: 1 host; Not enough job slots: 1 host;The pending reasons also mention the number of hosts for each condition.
You can view reasons why a job is pending or in suspension for all users by combining the
-p
and-u all
options.Viewing pending and suspend reasons with host names
To get specific host names along with pending reasons, use the
-p
and-l
options with thebjobs
command.For example:
%bjobs -lp
Job Id <7678>, User <user1>, Project <default>, Status <PEND>, Queue <priority> , Command <verilog> Mon Oct 28 13:08:11: Submitted from host <hostD>,CWD <$HOME>, Requested Resources <type==any && swp>35>; PENDING REASONS: Queue's resource requirements not satisfied: hostb, hostk, hostv; Unable to reach slave lsbatch server: hostH; Not enough job slots: hostF; SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - 0.7 1.0 - 4.0 - - - - - - loadStop - 1.5 2.5 - 8.0 - - - - - -Viewing suspend reasons only
The
-s
option ofbjobs
displays reasons for suspended jobs only. For example:
%bjobs -s
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 605 user1 SSUSP idle hosta hostc Test4 Oct 17 18:07 The host load exceeded the following threshold(s): Paging rate: pg; Idle time: it;[ Top ]
Viewing Detailed Job Information (bjobs -l)
The
-l
option ofbjobs
displays detailed information about job status and parameters, such as the job's current working directory, parameters specified when the job was submitted, and the time when the job started running. For more details onbjobs
options, see thebjobs
command in the Platform LSF Reference.
bjobs -l
with a job ID displays all the information about a job, including:For example:
%bjobs -l 7678
Job Id <7678>, User <user1>, Project <default>, Status <PEND>, Queue <priority> , Command <verilog> Mon Oct 28 13:08:11: Submitted from host <hostD>,CWD <$HOME>, Requested Resources <type==any && swp>35>; PENDING REASONS: Queue's resource requirements not satisfied:3 hosts; Unable to reach slave lsbatch server: 1 host; Not enough job slots: 1 host; SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - 0.7 1.0 - 4.0 - - - - - - loadStop - 1.5 2.5 - 8.0 - - - - - -[ Top ]
Viewing Job Resource Usage (bjobs -l)
LSF monitors the resources jobs consume while they are running. The
-l
option of thebjobs
command displays the current resource usage of the job.For more details on
bjobs
options, see thebjobs
command in the Platform LSF Reference.Job-level information
Job-level information includes:
- Total CPU time consumed by all processes of a job
- Total resident memory usage in KB of all currently running processes of a job
- Total virtual memory usage in KB of all currently running processes of a job
- Currently active process group ID of a job
- Currently active processes of a job
Update interval
The job-level resource usage information is updated at a maximum frequency of every SBD_SLEEP_TIME seconds. See the Platform LSF Reference for the value of SBD_SLEEP_TIME.
The update is done only if the value for the CPU time, resident memory usage, or virtual memory usage has changed by more than 10 percent from the previous update or if a new process or process group has been created.
Viewing job resource usage
To view resource usage for a specific job, specify
bjobs -l
with the job ID:%bjobs -l 1531
Job Id <1531>, User <user1>, Project <default>, Status <RUN>, Queue <priority> Command <example 200> Fri Dec 27 13:04:14 Submitted from host <hostA>, CWD <$HOME>, SpecifiedHosts <hostD>; Fri Dec 27 13:04:19: Started on <hostD>, Execution Home </home/user1>, Executio n CWD </home/user1>; Fri Dec 27 13:05:00: Resource usage collected. The CPU time used is 2 seconds. MEM: 147 Kbytes; SWAP: 201 Kbytes PGID: 8920; PIDs: 8920 8921 8922 SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - -[ Top ]
Viewing Job History (bhist)
Sometimes you want to know what has happened to your job since it was submitted. The
bhist
command displays a summary of the pending, suspended and running time of jobs for the user who invoked the command. Usebhist -u all
to display a summary for all users in the cluster.For more details on
bhist
options, see thebhist
command in the Platform LSF Reference.
- Viewing detailed job history
- Viewing history of jobs not listed in active event log
- Viewing chronological history of jobs
Viewing detailed job history
The
-l
option ofbhist
displays the time information and a complete history of scheduling events for each job.%bhist -l 1531
JobId <1531>, User <user1>, Project <default>, Command< example200> Fri Dec 27 13:04:14: Submitted from host <hostA> to Queue <priority>, CWD <$HOME>, Specified Hosts <hostD>; Fri Dec 27 13:04:19: Dispatched to <hostD>; Fri Dec 27 13:04:19: Starting (Pid 8920); Fri Dec 27 13:04:20: Running with execution home </home/user1>, Execution CWD </home/user1>, Execution Pid <8920>; Fri Dec 27 13:05:49: Suspended by the user or administrator; Fri Dec 27 13:05:56: Suspended: Waiting for re-scheduling after being resumed by user; Fri Dec 27 13:05:57: Running; Fri Dec 27 13:07:52: Done successfully. The CPU time used is 28.3 seconds. Summary of time in seconds spent in various states by Sat Dec 27 13:07:52 1997 PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 5 0 205 7 1 0 218Viewing history of jobs not listed in active event log
LSF periodically backs up and prunes the job history log. By default,
bhist
only displays job history from the current event log file. You can usebhist - n
num_logfiles to display the history for jobs that completed some time ago and are no longer listed in the active event log.bhist -n num_logfiles
The -n num_logfiles option tells the
bhist
command to search through the specified number of log files instead of only searching the current log file.Log files are searched in reverse time order. For example, the command
bhist -n 3
searches the current event log file and then the two most recent backup files.
bhist -n 1
searches the current event log file
lsb.events
bhist -n 2
searches
lsb.events
andlsb.events.1
bhist -n 3
searches
lsb.events
,lsb.events.1
,lsb.events.2
bhist -n 0
searches all event log files in LSB_SHAREDIR
Viewing chronological history of jobs
By default, the
bhist
command displays information from the job event history file,lsb.events
, on a per job basis.The
-t
option ofbhist
can be used to display the events chronologically instead of grouping all events for each job.The
-T
option allows you to select only those events within a given time range.For example, the following displays all events which occurred between 14:00 and 14:30 on a given day:
%bhist -t -T 14:00,14:30
Wed Oct 22 14:01:25: Job <1574> done successfully; Wed Oct 22 14:03:09: Job <1575> submitted from host to Queue , CWD , User , Project , Command , Requested Resources ; Wed Oct 22 14:03:18: Job <1575> dispatched to ; Wed Oct 22 14:03:18: Job <1575> starting (Pid 210); Wed Oct 22 14:03:18: Job <1575> running with execution home , Execution CWD , Execution Pid <210>; Wed Oct 22 14:05:06: Job <1577> submitted from host to Queue, CWD , User , Project , Command , Requested Resources ; Wed Oct 22 14:05:11: Job <1577> dispatched to ; Wed Oct 22 14:05:11: Job <1577> starting (Pid 429); Wed Oct 22 14:05:12: Job <1577> running with execution home, Execution CWD , Execution Pid <429>; Wed Oct 22 14:08:26: Job <1578> submitted from host to Queue, CWD , User , Project , Command; Wed Oct 22 14:10:55: Job <1577> done successfully; Wed Oct 22 14:16:55: Job <1578> exited; Wed Oct 22 14:17:04: Job <1575> done successfully;[ Top ]
Viewing Job Output (bpeek)
The output from a job is normally not available until the job is finished. However, LSF provides the
bpeek
command for you to look at the output the job has produced so far.By default,
bpeek
shows the output from the most recently submitted job. You can also select the job by queue or execution host, or specify the job ID or job name on the command line.For more details on
bpeek
options, see thebpeek
command in the Platform LSF Reference.Viewing output of a running job
Only the job owner can use
bpeek
to see job output. Thebpeek
command will not work on a job running under a different user account.To save time, you can use this command to check if your job is behaving as you expected and kill the job if it is running away or producing unusable results.
For example:
%bpeek 1234
<< output from stdout >> Starting phase 1 Phase 1 done Calculating new parameters ...[ Top ]
Viewing Information about SLAs and Service Classes
Monitoring the progress of an SLA (bsla)
Use
bsla
to display the properties of service classes configured inlsb.serviceclasses
and dynamic state information for each service class.
- One velocity goal of service class
Tofino
is active and on time. The other configured velocity goal is inactive.% bsla SERVICE CLASS NAME: Tofino -- day and night velocity PRIORITY: 20 GOAL: VELOCITY ACTIVE WINDOW: (9:00-17:00) STATUS: Active:On time VELOCITY: 10 CURRENT VELOCITY: 10 GOAL: VELOCITY ACTIVE WINDOW: (17:30-8:30) STATUS: Inactive VELOCITY: 30 CURRENT VELOCITY: 0 NJOBS PEND RUN SSUSP USUSP FINISH 360 300 10 2 0 48- The deadline goal of service class
Uclulet
is not being met, andbsla
displays statusActive:Delayed
:% bsla SERVICE CLASS NAME: Uclulet -- working hours PRIORITY: 20 GOAL: DEADLINE ACTIVE WINDOW: (8:30-16:00) DEADLINE: (Tue Jun 24 16:00) ESTIMATED FINISH TIME: (Wed Jun 25 14:30) OPTIMUM NUMBER OF RUNNING JOBS: 2 STATUS: Active:Delayed NJOBS PEND RUN SSUSP USUSP FINISH 13 0 0 0 0 13- The configured velocity goal of the service class
Kyuquot
is active and on time. The configured deadline goal of the service class is inactive.% bsla Kyuquot
SERVICE CLASS NAME: Kyuquot -- Daytime/Nighttime SLA PRIORITY: 23 USER_GROUP: user1 user2 GOAL: VELOCITY ACTIVE WINDOW: (9:00-17:30) STATUS: Active:On time VELOCITY: 8 CURRENT VELOCITY: 0 GOAL: DEADLINE ACTIVE WINDOW: (17:30-9:00) STATUS: Inactive NJOBS PEND RUN SSUSP USUSP FINISH 0 0 0 0 0 0- The throughput goal of service class
Inuvik
is always active.bsla
displays:
- Status as active and on time
- An optimum number of 5 running jobs to meet the goal
- Actual throughput of 10 jobs per hour based on the last CLEAN_PERIOD
% bsla Inuvik SERVICE CLASS NAME: Inuvik -- constant throughput PRIORITY: 20 GOAL: THROUGHPUT ACTIVE WINDOW: Always Open STATUS: Active:On time SLA THROUGHPUT: 10.00 JOBs/CLEAN_PERIOD THROUGHPUT: 6 OPTIMUM NUMBER OF RUNNING JOBS: 5 NJOBS PEND RUN SSUSP USUSP FINISH 110 95 5 0 0 10Tracking historical behavior of an SLA (bacct)
Use
bacct
to display historical performance of a service class. For example, service classesInuvik
andTuktoyaktuk
configure throughput goals.% bsla SERVICE CLASS NAME: Inuvik -- throughput 6 PRIORITY: 20 GOAL: THROUGHPUT ACTIVE WINDOW: Always Open STATUS: Active:On time SLA THROUGHPUT: 10.00 JOBs/CLEAN_PERIOD THROUGHPUT: 6 OPTIMUM NUMBER OF RUNNING JOBS: 5 NJOBS PEND RUN SSUSP USUSP FINISH 111 94 5 0 0 12 -------------------------------------------------------------- SERVICE CLASS NAME: Tuktoyaktuk -- throughput 3 PRIORITY: 15 GOAL: THROUGHPUT ACTIVE WINDOW: Always Open STATUS: Active:On time SLA THROUGHPUT: 4.00 JOBs/CLEAN_PERIOD THROUGHPUT: 3 OPTIMUM NUMBER OF RUNNING JOBS: 4 NJOBS PEND RUN SSUSP USUSP FINISH 104 96 4 0 0 4These two service classes have the following historical performance. For SLA
Inuvik
,bacct
shows a total throughput of 8.94 jobs per hour over a period of 20.58 hours:% bacct -sla Inuvik Accounting information about jobs that are: - submitted by users user1, - accounted on all projects. - completed normally or exited - executed on all hosts. - submitted to all queues. - accounted on service classes Inuvik, ------------------------------------------------------------------------------ SUMMARY: ( time unit: second ) Total number of done jobs: 183 Total number of exited jobs: 1 Total CPU time consumed: 40.0 Average CPU time consumed: 0.2 Maximum CPU time of a job: 0.3 Minimum CPU time of a job: 0.1 Total wait time in queues: 1947454.0 Average wait time in queue:10584.0 Maximum wait time in queue:18912.0 Minimum wait time in queue: 7.0 Average turnaround time: 12268 (seconds/job) Maximum turnaround time: 22079 Minimum turnaround time: 1713 Average hog factor of a job: 0.00 ( cpu time / turnaround time ) Maximum hog factor of a job: 0.00 Minimum hog factor of a job: 0.00 Total throughput: 8.94 (jobs/hour) during 20.58 hours Beginning time: Oct 11 20:23 Ending time: Oct 12 16:58For SLA Tuktoyaktuk,
bacct
shows a total throughput of 4.36 jobs per hour over a period of 19.95 hours:% bacct -sla Tuktoyaktuk Accounting information about jobs that are: - submitted by users user1, - accounted on all projects. - completed normally or exited - executed on all hosts. - submitted to all queues. - accounted on service classes Tuktoyaktuk, ------------------------------------------------------------------------------ SUMMARY: ( time unit: second ) Total number of done jobs: 87 Total number of exited jobs: 0 Total CPU time consumed: 18.0 Average CPU time consumed: 0.2 Maximum CPU time of a job: 0.3 Minimum CPU time of a job: 0.1 Total wait time in queues: 2371955.0 Average wait time in queue:27263.8 Maximum wait time in queue:39125.0 Minimum wait time in queue: 7.0 Average turnaround time: 30596 (seconds/job) Maximum turnaround time: 44778 Minimum turnaround time: 3355 Average hog factor of a job: 0.00 ( cpu time / turnaround time ) Maximum hog factor of a job: 0.00 Minimum hog factor of a job: 0.00 Total throughput: 4.36 (jobs/hour) during 19.95 hours Beginning time: Oct 11 20:50 Ending time: Oct 12 16:47Because the run times are not uniform, both service classes actually achieve higher throughput than configured.
For more information
See Administering Platform LSF for more information about service classes and goal-oriented SLA driven scheduling.
[ Top ]
Viewing Jobs in Job Groups
Viewing job group information (bjgroup)
Use the
bjgroup
command to see information about jobs in specific job groups.% bjgroup GROUP_NAME NJOBS PEND RUN SSUSP USUSP FINISH /fund1_grp 5 4 0 1 0 0 /fund2_grp 11 2 5 0 0 4 /bond_grp 2 2 0 0 0 0 /risk_grp 2 1 1 0 0 0 /admi_grp 4 4 0 0 0 0Viewing jobs by job group (bjobs)
Use the
-g
option ofbjobs
and specify a job group path to view jobs attached to the specified group.% bjobs -g /risk_group JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 113 user1 PEND normal hostA myjob Jun 17 16:15 111 user2 RUN normal hostA hostA myjob Jun 14 15:13 110 user1 RUN normal hostB hostA myjob Jun 12 05:03 104 user3 RUN normal hostA hostC myjob Jun 11 13:18
bjobs -l
displays the full path to the group to which a job is attached:% bjobs -l -g /risk_group Job <101>, User <user1>, Project <default>, Job Group </risk_group>, Status <RUN>, Queue <normal>, Command <myjob> Tue Jun 17 16:21:49: Submitted from host <hostA>, CWD </home/user1; Tue Jun 17 16:22:01: Started on <hostA>; ...For more information
See Administering Platform LSF for more information about using job groups.
[ Top ]
Viewing Information about Resource Allocation Limits
Your job may be pending because some configured resource allocation limit has been reached. Use the
blimits
command to show the dynamic counters of resource allocation limits configured in Limit sections inlsb.resources
.blimits
displays the current resource usage to show what limits may be blocking your job.blimits command
The
blimits
command displays:
- Configured limit policy name
- Users (
-u
option)- Queues (
-q
option)- Hosts (
-m
option)- Project names (
-p
option)Resources that have no configured limits or no limit usage are indicated by a dash (
-
). Limits are displayed in a USED/LIMIT format. For example, if a limit of 10 slots is configured and 3 slots are in use, thenblimits
displays the limit for SLOTS as 3/10.If limits MEM, SWP, or TMP are configured as percentages, both the limit and the amount used are displayed in MB. For example,
lshosts
displays maximum memory (maxmem
) of 249 MB, and MEM is limited to 10% of available memory. If 10 MB out of are used,blimits
displays the limit for MEM as 10/25 (10 MB USED from a 25 MB LIMIT).Configured limits and resource usage for builtin resources (slots, mem, tmp, and swp load indices) are displayed as INTERNAL RESOURCE LIMITS separately from custom external resources, which are shown as EXTERNAL RESOURCE LIMITS.
Limits are displayed for both the vertical tabular format and the horizontal format for Limit sections. Since a vertical format Limit section has no name,
blimits
displays NONAMEnnn under the NAME column for these limits, where the unnamed limits are numbered in the order the vertical-format Limit sections appear in thelsb.resources
file.If a resource consumer is configured as
all
, the limit usage for that consumer is indicated by a dash (-
).PER_HOST slot limits are not displayed. The
bhosts
commands displays these as MXJ limits.In MultiCluster,
blimits
returns the information about all limits in the local cluster.Examples
For the following limit definitions:
Begin Limit NAME = limit1 USERS = user1 PER_QUEUE = all PER_HOST = hostA hostC TMP = 30% SWP = 50% MEM = 10% End Limit Begin Limit NAME = limit_ext1 PER_HOST = all RESOURCE = ([user1_num, 30] [hc_num, 20]) End Limit
blimits
displays the following:% blimits INTERNAL RESOURCE LIMITS: NAME USERS QUEUES HOSTS PROJECTS SLOTS MEM TMP SWP limit1 user1 q2 hostA - - 10/25 - 10/258 limit1 user1 q3 hostA - - - 30/2953 - limit1 user1 q4 hostC - - - 40/590 - EXTERNAL RESOURCE LIMITS: NAME USERS QUEUES HOSTS PROJECTS user1_num hc_num HC_num limit_ext1 - - hostA - - 1/20 - limit_ext1 - - hostC - 1/30 1/20 -
- In limit policy
limit1
,user1
submitting jobs toq2
, q3, orq4
onhostA
orhostC
is limited to 30% tmp space, 50% swap space, and 10% available memory. No limits have been reached, so the jobs fromuser1
should run. For example, onhostA
for jobs fromq2
, 10 MB of memory are used from a 25 MB limit and 10 MB of swap space are used from a 258 MB limit.- In limit policy
limit_ext1
, external resourceuser1_num
is limited to 30 per host and external resourcehc_num
is limited to 20 per host. Again, no limits have been reached, so the jobs requesting those resources should run.
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: November 21, 2003
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2003 Platform Computing Corporation. All rights reserved.