Moab/TORQUE dies silently with no core and no logging


Issue: Moab/TORQUE dies silently with no core and no logging

Affected Versions: All

Symptom: Sometimes Moab or TORQUE will die silently without generating a core file.

Solution: In most cases, Adaptive Computing needs a core file to address a crash. However, in rare cases, a core file does not get created. To solve this, do the following:

  1. Ensure that your ulimit for core files is set to unlimited by putting "ulimit -c unlimited" in the init script (or running that on the command line), and then execute /opt/moab/sbin/moab (or start it via the service or systemctl command).
  2. Make sure you have enough disk space to write out a core file (i.e., about the same size as the Moab process memory footprint).
  3. Check the permissions of $MOABHOMEDIR and verify that the user the process Moab runs as can write to that directory.
  4. Check the running process ulimit with "cat /proc/$(pgrep moab)/limits"

If the above looks correct, but you still have no core file, you can attach to the running process with gdb and continue until a core dump happens.

  1. gdb attach $(pgrep moab) Note: you may have more than one Moab process. This is normal, as Moab forks from time to time. Watch top or ps to determine the static moab process, and then substitute the PID where needed. Example: run "ps -ef | grep -v grep | grep -E " PPID |sbin\/moab". You want the whose PID is the PPID of the other.
  2. At the "(gdb)" prompt, type "c". This will allow the Moab process to continue. You can start this in tmux, screen or a terminal session while you wait for a core dump. 
  3. Once a core dump is triggered, gdb will return to the "(gdb)" prompt. Execute the following commands:
    1. (gdb) bt
    2. (gdb) generate-core-file
  4. Copy the output of the "bt" command and send that as an update in the support case.
  5. "generate-core-file" will dump a core file to your current working directory (i.e., where you invoked gdb). Create a tar.gz archive containing the core file and the Moab or TORQUE binary. Please include the Salesforce case ID in the file name.
  6. Upload the tar.gz archive to:
    1. scp <file>.tar.gz [email protected]:/home/guest/
    2. password: hello
  7. Send the file name to support in a case update.

 

See also:

http://docs.adaptivecomputing.com/torque/6-0-2/adminGuide/help.htm#topics/torque/12-troubleshooting/debugging.htm

(Torque Admin Guide: 13 Troubleshooting > Debugging)

Tags: backtrace, core, crash, gdb
Last update:
2017-02-24 02:20
Author:
Jason Booth
Revision:
1.1
Average rating:0 (0 Votes)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.