Why is there not a checkpoint file for my job submitted from another host?

Issue: Missing checkpoint (.cp) file for job submitted from a submit host.

Affected Version: All versions, Moab with Torque

Symptom: A job is submitted from a submit host, and no checkpoint file appears in the Moab spool directory. Often there may be data needed by scripts in that file.

Solution: There is a Moab configuration flag that can be applied to the resource manager configuration that will cause all of the job information to be copied before the job is run. The flag is FULLCP, and can be configured like this:

RMCFG[pbs]      TYPE=PBS SUBMITCMD=/usr/local/bin/qsub FLAGS=FULLCP

This became a problem during the DataWarp project, which does gather information from the checkpoint files. The DataWarp submit filter creates additional storage jobs, whose checkpoint files showed up on the Moab server, but the original job checkpoint file did not.


Last update:
2017-08-15 22:33
Rob Greenbank
Average rating:0 (0 Votes)

You cannot comment on this entry

