Why are licenses being overallocated for array jobs?

Issue: When submitting array jobs, all jobs run at once even if generic resources are not available.  For resources like licenses, managed by FlexLM, this can cause jobs to fail as there may not be enough licenses available. 

This problem was encountered with license resources, but it could also happen with any generic resource where a delayed start is needed to allow Moab to account for delays in allocating resources. 

Affected Version:  All

Sympton: An array job is submitted in which each job is requesting a software license.  The GRES configuration setting STARTDELAY has been set to an interval that is appropriate.  The expected behavior is to see one array job start at each of those intervals, or the next scheduling iteration afterwards.  Instead, however, all jobs in the job array go to running state immediately, and if there are too few licenses for all jobs to run at once, some will fail.

Solution:  Unfortunately there is not a good solution for this situation.  The only real alternative is to submit jobs individually.  The STARTDELAY was never designed to support job arrays.  If users frequently attempt this, a submit filter could be used to reject the job and explain the reason why. 



Last update:
2018-01-19 18:02
Rob Greenbank
Average rating:0 (0 Votes)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.

Records in this category