Why does TORQUE leave many defunct moms behind?


 

Issue: Over time pbs_moms build up on compute nodes or Cray login nodes. 

 

Symptom: 

On the compute nodes or Cray login nodes you see many pbs_mom processes in a defunct state.

 

mom60 "ps -elf | grep pbs | grep defunct | wc -l"
mom9: 10
mom30: 1
mom4: 3
mom34: 8
mom63: 1
mom32: 11
mom31: 6
mom43: 18
mom29: 21
mom47: 9
mom61: 17
mom44: 4
mom33: 10
mom53: 2
mom36: 2
mom28: 2186
mom23: 2264
mom26: 2189
mom40: 2361
mom41: 2352
mom58: 5946
mom5: 8247
mom60: 7217
mom1: 8290
mom6: 8250

 

Solution:

Upgrade to 4.2.4 or later.

 

Jira: 

TRQ-2015

Tags: defunct, mom, pbs_mom
Last update:
2015-08-10 16:41
Author:
Jason Booth
Revision:
1.0
Average rating:0 (0 Votes)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.

Records in this category

Tags