Why do I see "could not locate requested gpu resources" when the job requested none?



Issue: PBS_SERVER occationaly reports the following in the server logs, "could not locate requested gpu resources".

Symptom: When submitting job into a TORQUE some jobs appear to no run and pbs_server logs an error related to gpu resources. Most of the time gpus are not configured on the cluster or the job requested no gpus. 

Solution: This is a false postitive. The job most likly did not start due to other resource constraints such as nodes or proc. In some cases such as with Cray the site may need to modify "JOBMAXNODECOUNT" to a higer value and or modify the jobs they submit. For example:

 If submitting OpenMP jobs on the Cray it is importatnt to note that Cray has their own implementation so submitting with just mppwidth and doing the rest of the job lauch through aprun with $PBS_NODEFILE is suggested. 

Tags: could not locate requested gpu resources, mppwidth
Last update:
2016-04-29 19:34
Author:
Jason Booth
Revision:
1.0
Average rating:0 (0 Votes)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.

Records in this category

Tags