How can I remove jobs from Moab that won't clean up with conventional methods?


Issue:  Moab reports a job that isn't being reported by the resource manager and can't be canceled.

Symptom:  There are occasionally problems migrating a job from Moab to the resource manager.  Jobs can sometimes get stuck without a way to remove them through normal means.  Here is an article showing some steps to try first:

http://kb.adaptivecomputing.com/phpmyfaq/index.php?action=artikel&cat=1&id=22&artlang=en

If the steps outlined in that article don't work you can remove the entry directly from Moab's checkpoint file.

Solution:  When editing Moab's checkpoint file it's always a good idea to start by making a backup.  Accidentally removing more from this file that you should can result in lost jobs that will have to be re-submitted.  
1.  Navigate to your Moab home directory.
2.  Moab will write current information to the checkpoint file when it's shutdown, so you want to stop Moab before editing the file.
3.  Make a backup copy of the checkpoint file.
4.  Edit the checkpoint file and look for an entry with the job id(s) you're trying to remove.  If the job is showing up in an active state the line should start with 'JOB', if it's in a completed state it will start with 'CJOB'.  After the 'JOB' or 'CJOB' you should see the job id.
5.  Remove the lines for the jobs you're trying to remove and save your changes.
6.  Start Moab again.
7.  Review the list of jobs to see that the stuck jobs are gone and that you aren't missing other jobs that should still be there.

If you do find that you've removed more than you should you can stop Moab, move the backup file back in place and start Moab again.

Tags: checkpoint, clean, remove, stuck
Last update:
2016-12-20 00:01
Author:
Ben Roberts
Revision:
1.0
Average rating:0 (0 Votes)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.

Records in this category

Tags