How does Adaptive Computing recommend shutting down Moab and Torque for a scheduled maintenance window?
To allow as many jobs as possible to finish, you may pause Moab scheduling (allowing the cluster to drain (or mostly drain)) by running:
$ mschedctl -p
This will permit Moab to continue receiving Torque job updates without launching newly queued jobs.
When the time arrives to cancel the remainder of the running jobs, execute this command as an ADMIN1:
$ mjobctl -c -w state=Running
Alternatively, you may attempt to requeue jobs first with:
$ mjobctl -R -w state=Running
This will cause Moab to cancel and requeue any jobs with the "RESTARTABLE" flag.
To shut the systems down, simply stop Moab and pbs_server in the normal manner (with "service moab stop" and "service pbs_server stop", or "systemctl stop moab" and "systemctl stop pbs_server"). Upon restarting Torque, pbs_server will reload the saved job files for the previously queued jobs.