Adaptive Computing Inc - How can I send OS signals to jobs

How can I send OS signals to jobs

Issue: In a HPC environment it is sometimes nessassary to send a signal to a job before it reaches its walltime. The mechnism in Moab and TORQUE to make use of this is "-l signal=15@00:30 ", where 15 is the signal and 00:30 is the time before the job reaches its walltime. In this case send signal 15 thirty seconds before the jobs ends. The job script should have a catch or trap for the signal and code to handle any checkpointing or cleanup in the remaining time.

Solution:

Moab TORQUE

When Moab manages TORQUE; signals are much more seemless. By adding a -l signal=nn@tt:tt will allow the use of signals and be interpreted correctly and seemlessly between Moab and TORQUE. Note that the signaling happens via Moab.

Note in the following checkjob the trigger associated with a job was submitted with "-l signal=15@00:30".

[jbooth@support-mpi ~]$ checkjob 83513

job 83513

AName: STDIN

State: Running

Creds: user:jbooth group:jbooth class:batch

WallTime: 00:00:00 of 00:00:30

SubmitTime: Thu Jul 23 09:54:31

(Time Queued Total: 00:00:07 Eligible: 00:00:00)

StartTime: Thu Jul 23 09:54:38

TemplateSets: DEFAULT

NodeMatchPolicy: EXACTNODE

Triggers: [email protected]:internal@job:-:signal:15:FALSE

Total Requested Tasks: 1

Req[0] TaskCount: 1 Partition: pbs

NodeCount: 1

Allocated Nodes:

[support-mpi2:1]

SystemID: Moab

SystemJID: 503

Notification Events: JobFail

IWD: $HOME

SubmitDir: $HOME

Executable: /opt/moab/spool/moab.job.7g1VeQ

StartCount: 1

Flags: GLOBALQUEUE

Attr: checkpoint

StartPriority: 1

Reservation '83513' (-00:00:10 -> 00:00:20 Duration: 00:00:30)

[jbooth@support-mpi ~]$ qstat -f

Job Id: 83513.support-mpi

Job_Name = STDIN

Job_Owner = jbooth@support-mpi

resources_used.cput = 00:00:00

resources_used.mem = 0kb

resources_used.vmem = 0kb

resources_used.walltime = 00:00:12

job_state = E

queue = batch

server = support-mpi

Checkpoint = u

ctime = Thu Jul 23 09:54:31 2015

Error_Path = support-mpi:/home/jbooth/STDIN.e83513

exec_host = support-mpi2/0

exec_port = 15003

Hold_Types = n

Join_Path = n

Keep_Files = n

Mail_Points = a

mtime = Thu Jul 23 09:54:50 2015

Output_Path = support-mpi:/home/jbooth/STDIN.o83513

Priority = 0

qtime = Thu Jul 23 09:54:31 2015

Rerunable = False

Resource_List.nodect = 1

Resource_List.nodes = 1

Resource_List.signal = 15@00:20

Resource_List.walltime = 00:00:30

session_id = 5579

Variable_List = PBS_O_QUEUE=batch,PBS_O_HOME=/home/jbooth,

PBS_O_LOGNAME=jbooth,

PBS_O_PATH=/opt/moab-7.2-1518952b-b-Unknown12/bin:/opt/8.0.1/bin:/opt

/8.1.0/bin:/usr/lib64/qt-3.3/bin:/opt/mam/bin:/usr/local/sbin:/usr/loc

al/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/opt/moab/bin:/opt/moab

/sbin,PBS_O_SHELL=/bin/bash,PBS_O_WORKDIR=/home/jbooth,

PBS_O_HOST=support-mpi,PBS_O_SERVER=support-mpi

etime = Thu Jul 23 09:54:31 2015

exit_status = 271

start_time = Thu Jul 23 09:54:38 2015

start_count = 1

checkpoint_dir = /tmp

fault_tolerant = False

job_radix = 0

submit_host = support-mpi

nppcu = 1

x = signal:15@00:20;SID:Moab;SJID:503;SRMJID:503

Also note the qstat output as it contains x args with the signal ( x = signal:15@00:20;SID:Moab;SJID:503;SRMJID:503). In the event Moab looses track of the job it will re-learn it from TORQURE and use the "x" arguments to re-create the trigger.

[jbooth@support-mpi ~]$ qstat -f

Job Id: 83513.support-mpi

Job_Name = STDIN

Job_Owner = jbooth@support-mpi

resources_used.cput = 00:00:00

resources_used.mem = 0kb

resources_used.vmem = 0kb

resources_used.walltime = 00:00:12

job_state = E

queue = batch

server = support-mpi

Checkpoint = u

ctime = Thu Jul 23 09:54:31 2015

Error_Path = support-mpi:/home/jbooth/STDIN.e83513

exec_host = support-mpi2/0

exec_port = 15003

Hold_Types = n

Join_Path = n

Keep_Files = n

Mail_Points = a

mtime = Thu Jul 23 09:54:50 2015

Output_Path = support-mpi:/home/jbooth/STDIN.o83513

Priority = 0

qtime = Thu Jul 23 09:54:31 2015

Rerunable = False

Resource_List.nodect = 1

Resource_List.nodes = 1

Resource_List.signal = 15@00:20

Resource_List.walltime = 00:00:30

session_id = 5579

Variable_List = PBS_O_QUEUE=batch,PBS_O_HOME=/home/jbooth,

PBS_O_LOGNAME=jbooth,

PBS_O_PATH=/opt/moab-7.2-1518952b-b-Unknown12/bin:/opt/8.0.1/bin:/opt

/8.1.0/bin:/usr/lib64/qt-3.3/bin:/opt/mam/bin:/usr/local/sbin:/usr/loc

al/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/opt/moab/bin:/opt/moab

/sbin,PBS_O_SHELL=/bin/bash,PBS_O_WORKDIR=/home/jbooth,

PBS_O_HOST=support-mpi,PBS_O_SERVER=support-mpi

etime = Thu Jul 23 09:54:31 2015

exit_status = 271

start_time = Thu Jul 23 09:54:38 2015

start_count = 1

checkpoint_dir = /tmp

fault_tolerant = False

job_radix = 0

submit_host = support-mpi

nppcu = 1

x = signal:15@00:20;SID:Moab;SJID:503;SRMJID:503

Moab SLURM

When Moab manages SLURM: the HPC admin will need to make a decision if the resource manager will mange signals or if the scheudler will be sending signal instruction. Although SLURM supports sending signals outside of Moabs knowledge; it is suggested that Moab control this mechnism. Currently when you submit with "sbatch --signal=<sig_num>[@<sig_time>]" the signal is not passed up to Moab. Likewise the signal with Moab is sent down to SLURM as a comment and not with --signal.

Comment='SIGNAL=sigurg@30??SJID:2123?SID:Moab'

To submit via sbatch and have Moab manage the signal: you submit as following:

sbatch -t 30 --comment='SIGNAL=sigurg@30' cmdl

Moab will create a trigger for the signal when Moab learns about the job from SLURM. Moab consumes the comment and sees the note to create a trigger for this job and send a signal down.

checkjob output

Triggers: [email protected]:internal@job:-:signal:sigurg:FALSE

------------------------------------------------------------------------------------

mdiag -T output

Compared with a msub trigger

Triggers: [email protected]:internal@job:-:signal:23:FALSE

Logging: In the following logging note that Moab is the one sending instructions to SLURM to singal the job.

2015-07-23T09:24:14.166-0600 32392 TRACE1 MRMJob.c:MRMJobSignal:1867 0 MRMJobSignal(2125,-1,15,EMsg,SC)

2015-07-23T09:24:14.166-0600 32392 TRACE1 MWikiJob.c:MWikiJobSignal:110 0 MWikiJobSignal(2125,slurm,-1,15,EMsg,SC)

2015-07-23T09:24:14.166-0600 32392 TRACE1 MWikiI.c:MWikiDoCommand:887 0 MWikiDoCommand(slurm,jobsignal,P,FALSE,CHECKSUM,FALSE,DataP,DataSizeP,CMD=SIGNALJOB ARG=2125 ACTION=SIGNAL VALUE=15,EMsg,SC)

...

2015-07-23T09:24:14.167-0600 32392 TRACE1 MWikiI.c:MWikiDoCommand:1122 0 message sent: 'CMD=SIGNALJOB ARG=2125 ACTION=SIGNAL VALUE=15'

...

2015-07-23T09:24:14.167-0600 32392 INFO MWikiJob.c:MWikiJobSignal:216 0x1002973 job:2125 Job '2125' successfully signaled.

Moab Cray

Integration with Moab and Cray is done via TORQUE. See the section on TORQUE for singal behavior. The only difference here is that Moab will send the signal via a trigger to the mom where aprun was run. Aprun will then need to translate the signal provided and pass it down to the Cray compute nodes.

Note: Some Cray sites have reported that only signals 23 and 28 work on the Cray with Moab/Torque.

Additional information:

Note that in this checkjob output you see the trigger associated with this job. You can use checkjob to validate that a trigger has been created and associated with the job.

[root@support-slurm moab]# checkjob ALL
job 2125

AName: trap.sh
State: Running
Creds: user:jbooth group:jbooth class:extra
WallTime: 00:00:00 of 00:01:00
SubmitTime: Thu Jul 23 09:23:24
(Time Queued Total: 00:00:20 Eligible: 00:00:00)

StartTime: Thu Jul 23 09:23:44
TemplateSets: DEFAULT
Triggers: [email protected]:internal@job:-:signal:15:TRUE
Total Requested Tasks: 1

Req[0] TaskCount: 1 Partition: slurm
NodeCount: 1

Allocated Nodes:
[support-sn1:1]

SystemID: Moab
SystemJID: 2125

IWD: /home/jbooth
SubmitDir: $HOME
Executable: /opt/moab/spool/moab.job.EKFula

StartCount: 1
Partition List: slurm
Flags: GLOBALQUEUE
StartPriority: 1
Reservation '2125' (-00:01:00 -> 00:00:00 Duration: 00:01:00)

Likewise you can use mdiag -T to check the triggers on a system to see if the signal was sent, recieved and created:

[root@support-slurm moab]# mdiag -T
TrigID Object ID Event AType ActionDate State
--------------------- -------------------- -------- ------ -------------------- -----------
termsignal.8* job:2125 end intern -00:01:03 Successful
* indicates trigger has completed

Tags: signal

Last update:: 2015-07-23 18:57
Author:: Jason Booth
Revision:: 1.11