How to shut down Moab and Torque for a planned outage


Question:

How does Adaptive Computing recommend shutting down Moab and Torque for a scheduled maintenance window?

 

Answer: 

To allow as many jobs as possible to finish, you may pause Moab scheduling (allowing the cluster to drain (or mostly drain)) by running:

$ mschedctl -p

This will permit Moab to continue receiving Torque job updates without launching newly queued jobs.

When the time arrives to cancel the remainder of the running jobs, execute this command as an ADMIN1:

$ mjobctl -c -w state=Running

Alternatively, you may attempt to requeue jobs first with:

$ mjobctl -R -w state=Running

This will cause Moab to cancel and requeue any jobs with the "RESTARTABLE" flag.

To shut the systems down, simply stop Moab and pbs_server in the normal manner (with "service moab stop" and "service pbs_server stop", or "systemctl stop moab" and "systemctl stop pbs_server"). Upon restarting Torque, pbs_server will reload the saved job files for the previously queued jobs.

Last update:
2017-07-27 21:28
Author:
Rick McKay
Revision:
1.2
Average rating:0 (0 Votes)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.

Records in this category

Tags