Issue:
TORQUE appears to be unresponsive with error (Pbs Server is currently too busy to service this request. Please retry this request.)'
Example:
-00:02:44 'job submit failed - qsub: submit error (Pbs Server is currently too busy to service this request. Please retry this request.)'
clusterquery (34 of 737 failed)
-00:01:57 'cannot load cluster info - pbs_errno=15033'
queuequery (40 of 735 failed)
-00:01:57 'cannot get queue info - no data available'
Solution:
Consider looking at
cat /proc/´pgrep pbs_server´/status |grep -i thred
If the output is over 400, in this case Threads: 516, then consider setting a "max_threads" value in pbs_servers qmgr.
http://docs.adaptivecomputing.com/torque/6-0-2/adminGuide/help.htm#topics/torque/13-appendices/serverParameters.htm#max_threads
A good number for this value can be calculated with the following formula.
(2 * the number of procs listed in /proc/cpuinfo) + 1. If Torque is unable to read /proc/cpuinfo, the default is 10.
For most sites a value of 270-350 is suficient. Anything over 400 will most likely cause the OS to thrash between threads.
Tags: currently too busy to service this request