Cannot release hold on job


Issue: Attempts to release holds with qrls return no message, and do nothing, and attempts to release holds with mjobctl -u say:

"holds not modified for job 123456 (hold still in place)

However, checkjob will report that the job can run.

 

Solution: This is one possible reason for the problem: when submitting with a slot limit (e.g., qsub -t 0-10%2), Torque will refuse to let more jobs run beyond the specified slot limit (two, in this example).

As of the writing of this article (September 2015, Torque 5.0.2 & 5.1.1), the only way to really know the cause is to notice the slot limit indicator (e.g., -t 0-99%1) on the submit_args line of qstat -f for the array job. However, this will not display if the -t option was specified via a #PBS directive in the job script. These issues have been reported to Engineering, to be addressed in future releases.

Update: starting with Torque versions 6.0.3 and 6.1.2, qstat -f includes the slot limit indicator on the "job_array_request" line.

 

JIRA issues:

TRQ-3332

TRQ-3228

Last update:
2017-01-10 22:59
Author:
Rick McKay
Revision:
1.2
Average rating:0 (0 Votes)

You can comment this FAQ

Chuck Norris has counted to infinity. Twice.

Records in this category

Tags