TORQUE appears to be unresponsive with error (Pbs Server is currently too busy to service this request. Please retry this request.)'


Issue:

TORQUE appears to be unresponsive with error (Pbs Server is currently too busy to service this request. Please retry this request.)'

Example:

-00:02:44 'job submit failed - qsub: submit error (Pbs Server is currently too busy to service this request. Please retry this request.)' 
clusterquery (34 of 737 failed) 
-00:01:57 'cannot load cluster info - pbs_errno=15033' 
queuequery (40 of 735 failed) 
-00:01:57 'cannot get queue info - no data available' 


Solution:

Consider looking at 

cat /proc/´pgrep pbs_server´/status |grep -i thred

If the output is over 400, in this case Threads: 516, then consider setting a "max_threads" value in pbs_servers qmgr.

http://docs.adaptivecomputing.com/torque/6-0-2/adminGuide/help.htm#topics/torque/13-appendices/serverParameters.htm#max_threads

 

A good number for this value can be calculated with the following formula.

(2 * the number of procs listed in /proc/cpuinfo) + 1. If Torque is unable to read /proc/cpuinfo, the default is 10.

For most sites a value of 270-350 is suficient. Anything over 400 will most likely cause the OS to thrash between threads.  

Tags: currently too busy to service this request
Last update:
2016-08-08 17:21
Author:
Jason Booth
Revision:
1.0
Average rating:0 (0 Votes)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.

Records in this category

Tags