H
- How do I exclude a credential from fairshare
Issue: How do I exclude a credential from fairshare Affected Version: All Symptom: Some sites would like to reward users by not afffecting their overall job priority for ... - How does Showq -r "EFFIC" work?
Issue: How is EFFIC reported and how often does it update? Symptom: Showq -r can help determin the usage of a system and how efficiant it is ... - How can I keep Moab from provisioning a different OS when one is availible
Issue: How can I keep Moab from provisioning a different OS when one is availible Symptom: I have two systems configured. One with RHEL 5 and one with ... - How can I disable a queue / class without losing jobs?
Issue: How can I disable a queue / class without losing jobs? Symptom: Some sites wish to disable queues for a number of days before retiring the queue/class. Solution: To ... - How can I sync my Moab and TORQUE batch job IDs?
Issue: How can I sync my Moab and TORQUE batch job IDs? Affected Versions: All Symptom: When submitting jobs, you see different job IDs for Moab and ... - How does ALPS integrate with TORQUE and what does a job life-cycle look like?
Issue: How does ALPS integrate with TORQUE and what does a job life-cycle look like? Affected Versions: 4.x and greater. Solution: Moab and TORQUE sit over ALPS as outlined below. TORQUE ... - How do I download TORQUE with wget?
Issue: How do I download TORQUE with wget? Affected Versions: All Symptom: When downloading Torque from Adaptive Computing you can see that you click the link and it ... - How to fix Mass Job cancellation creates DDOS on pbs_server (around 4.2.x)
Issue: How to fix Mass Job cancellation creates DDOS on pbs_server (around 4.2.x) Affected Versions: 4.2.5, 4.2.x Symptom: In some of the lower to mid versions of Torque ... - How can I configure Moab to be aware of my file-system failures?
Issue: How can I configure Moab to be aware of my file-system failures? Affected Versions: All Symptom: When a filesystme goes down Moab will continue to start jobs ... - How can I troubleshoot reports in MCM, and from where does MCM get its data?
Question: How can I troubleshoot reports in MCM, and where does MCM get its data? Affected Versions: All Issue: When running reports, it can be helpful to ... - How can I generate a report of processor hours used by group per month?
Issue: How can I generate a report of processor hours used by group per month? Affected Versions: All Symptom: Som customers would like to generate a report of processor hours used by ... - How does Matlab work with Moab and TORQUE?
Issue: How does Matlab work with Moab and TORQUE? Affected Versions: All Symptom: As a customer I would like to understand how Moab and TORQUE integrate with ... - How can I use a job template to limit a jobs walltime based on the number of processors requested?
Issue: How can I use a job template to limit a jobs walltime based on the number of processors requested? Affected Versions: All Symptom: Some sites wish to ... - How can I tell Moab to ignore SLURM options on the execution line?
Issue: How can I tell Moab to ignore SLURM options on the execution line? Affected Versions: ALL Solution: To have Moab ignore ignore options and pass them down ... - How can I send OS signals to jobs
Issue: In a HPC environment it is sometimes nessassary to send a signal to a job before it reaches its walltime. The mechnism in Moab and ... - How can I keep Moab from multiplying my requested disk space by the number of procs I request?
If you are trying to request nodes that have a certain amount of disk space available you can do that with the file=X flag during ... - How can I add a dependency via MWS
Issue: I would like to add job dependencies when I submit a job into MWS. Solution: When you submit the job use type or resourceManagerExtension":"x=depend:afterok:Moab.74" Represents the type ... - How can I change a class/queue on a job?
Issue: How can I change a class/queue on a job? Symptom: At times it might be needed to change a jobs queue. So long as the job is ... - How can I have Moab cancel a job if a node fails?
Issue: Job continue to run long after a node it was using fails. Symptom: After a node fails with a job on it the job is unable ... - How can I modify all array jobs submitted from TORQUE?
Issue: How can I modify all TORQUE array sub-jobs with mjobctl -m Symptom: Submitting TORUQE jobs with qsub -t 1-10 will cause Moab to create individual jobs in ... - How can I set the end time of an allocation to today?
Issue: I'm getting an error when trying to set the end time of an allocation to the current date. Symptom: # gchalloc -e 2015-08-04 -i 12 The allocation ... - How can I run two different sets of TORQUE client commands on one system?
Issue: How can I run two different sets of TORQUE client commands on one system? Symptom: Starting in TORQUE 4.x trqauthd was introduced. In the past it ... - How can I graph the Moab scheduling cycle?
Issue: You want to graph the Moab scheduling cycle, to spot trends and potential issues. Solution: Install Munin, out of scope for this ... - How does Torque calculate and report memory usage?
Torque pulls information about memory usage for jobs from /proc/<pid>/stat. It pulls information on the Resident Set Size (RSS) to populate the mem field and ... - How can we clean out old records from the MAM database?
Issue: I would like to be able to remove old records from the database. Solution: Since version 7.2.0, you can use the System Prune action in ... - How is Moab calculating my jobs priority in checkjob?
Issue: How is Moab calculating my jobs priority in checkjob? Symptom: In this example I am using jobs 1111111111 from your support-diag output. The job received the -54.5 ... - How can I see the account balances Moab knows about when using the fast-allocation method?
Issue: When using the fast-allocation method Moab has a cached copy of the account balances that it periodically refreshes. How can I see the current ... - How do I prevent users from seeing sensitive information in the moab.cfg?
Issue: Users need to have read access to the moab.cfg to be able to run client commands. The moab.cfg contains information that users shouldn't see ... - How can I setup flexlm for testing?
Issue: How can I setup flexlm for testing? Solution: Configuration (moab.cfg): RMCFG[flexlm] RESOURCETYPE=LICENSERMCFG[flexlm] TYPE=NATIVERMCFG[flexlm] CLUSTERQUERYURL=file:///opt/moab/flexlm.state Configuration (flexlm.state): GLOBAL UPDATETIME=1104688300 STATE=idle ARES=procct:100000,stata:2,testflag:2,comsolscript:2,comsolckl:30,macromodel:1000,extmdcs:32,osumdcs:32,ansys:2,fluent:25,abaqus:200,cfdnd:20,ansysnd:20,abaquscae:14 CRES=procct:100000,stata:2,testflag:2,comsolscript:2,comsolckl:30,macromodel:1000,extmdcs:32,osumdcs:32,ansys:2,fluent:25,abaqus:200,cfdnd:20,ansysnd:20,abaquscae:14 Note that mdiag -n -v now shows the test ... - How can I verify that mom to mom communication is happening on the right interface?
Question: I have moms with infiniband and ethernet interfaces. How can I verify that the communication between moms is happening on the right interface? Answer: You ... - How can I specify a default account for each partition with a fairshare tree?
Issue: How can I specify a default account for each partition with a fairshare tree? Solution: Moab supports a deatiled credential configuration for each partition it knows ... - How can I create queue in MAM that does not require allocation?
Issue: How can I create queue in MAM that does not require allocation? Solution:This depends on how you want to configure policy on your cluster. If ... - How can I remove a reservation with special character?
Issue: How can I remove a reservation with special character? Symptom: In some cases an admin may create an administrative reservation with a special character such as *,%,#,$,&. ... - How do you create custom Torque RPMs from source?
Issue: We'd like to create RPMs from (possibly customized) Torque source. Solution: 1. Check out the Torque source (search this KB for instructions, if needed), or download the ... - How to migrate queued jobs to a new Torque server with a different host name
Issue: We need to migrate our queued Torque jobs from one server to another one with a new host name. Solution: Disclaimer: the steps provided here have not ... - How can I query the compute node processes that belong to a job?
Issue: How can I query the compute node processes that belong to a job? Symptom: I would like to know what process numbers are associated witha ... - How can I isolate which compute nodes show up in a specific CLASS?
Issue: How can I isolate which compute nodes show up in a specific CLASS? Symptom: All of my nodes show up in every calss. Solution: By default, each ... - How to get all cgroup cpuset cores with singlejob policy job submissions (without specifying procs)
Issue: After upgrading to Torque 6.0.x with cgroups enabled, jobs submitted with "-l nodes=1,naccesspolicy=singlejob" get restricted to one core because that's what gets assigned to the ... - How will Moab act if I add and over subscribe nodes over my license moab.lic limitations
I had a site recently try this and claimed it caused major problems because Moab would try to start jobs there. I tested against this ... - How can I setup a job to request a specific set of nodes and then if those nodes are not available use other nodes?
Issue: How can I setup a job to request a specific set of nodes and then if those nodes are not available use other nodes? Solution: One option ... - How to keep Moab statistical data when migrating to a new server
Question: We have plans to migrate Moab to a new host server, but we would like to avoid losing our historical stats. What's required to bring ... - How can I run two applications in one job request in the Cray environment with one on Haswell compute nodes and the other on KNL nodes?
Question: How can I run two applications in one job request in the Cray environment with one on Haswell compute nodes and the other on KNL ... - How can I create a reservation to prevent jobs from running with a specific job attribute?
Question: How can I create a reservation to prevent jobs from running with a specific job attribute? Solution: Reservations support a JOBATTRLIST both in a STRCFG and via ... - How can I see which gpu is being used by a job?
Question: There are multiple gpus on our nodes. How can I see which gpu is being used by a job?Answer: Torque reports the gpu that ... - How can I remove jobs from Moab that won't clean up with conventional methods?
Issue: Moab reports a job that isn't being reported by the resource manager and can't be canceled.Symptom: There are occasionally problems migrating a job from ... - How do I adjust the wallclock time on an entire job array?
Problem: How do you adjust the walltime on all of the sub-elements of a job array? Solution: You can use mjobctl -m to modify the array, and ONLY provide ... - Hotfix Releases for Viewpoint 9.0.x
The hotfixes listed below are not meant for general consumption. If you believe one may be applicable to your environment, please open a support case ... - Hotfix Releases for TORQUE 6.0.x
The hotfixes listed below are not meant for general consumption. If you believe one may be applicable to your environment, please open a support case ... - Hotfix Releases for TORQUE 6.1.x
The hotfixes listed below are not meant for general consumption. If you believe one may be applicable to your environment, please open a support case ... - Hotfix Releases for 9.0.x
The hotfixes listed below are not meant for general consumption. If you believe one may be applicable to your environment, please open a support case ... - Hotfix Releases for 9.1.x
The hotfixes listed below are not meant for general consumption. If you believe one may be applicable to your environment, please open a support case ... - How to configure scheduling leeway for transient load
Problem: Moab indicates that nodes are Idle in mdiag -n, checknode, showq, etc., but in reality they have higher load, and jobs will not run ... - How to improve readability of MWS JSON query outputs
Problem: when logged in to MWS as an administrator, the example queries (for instance, jobs & nodes) provide JSON output that runs together and is difficult ... - How to see array subjobs in showq
Problem: qstat -t provides a way to see all sub-elements of an array job, but how does showq allow you to see that? Solution: showq --blocking will ... - How do I change the log level for pbs_mom?
Question: I would like to know how to change the log level for the pbs_mom. Solution: To change the log level on a permanent basis you ... - How to shut down Moab and Torque for a planned outage
Question: How does Adaptive Computing recommend shutting down Moab and Torque for a scheduled maintenance window? Answer: To allow as many jobs as possible to finish, you may ... - How to see the scripts of completed jobs
Question: We'd like to see the script a job ran after it has already finished, and/or the full path to the original script Solution/s: 1) For msub ... - How can I provide users with more than job ID on submission?
Issue: Need to provide relevant information to user from submit filter Affected Version: Versions 8.13, 9.0.1 and later Problem: Admins would like to have more information ... - How to do an "offline" Install of the Moab suite
Problem: We have several customers who don't have their cluster directly connected to the Internet. They need a way to easily get all the required dependencies ... - How to troubleshoot Reporting and Analytics
Reporting/Analytics is a huge thing. The "No data for reports" could be a failure in ANY number of connectors. Here is my troubleshooting process: Moab/Torque Headnodes: Make ... - How do I migrate server settings to a new server, and retain the job numbering?
Issue: Need to migrate Torque to a new server, yet continue the job numbering Affected Version: All Versions Problem: The server running Torque is being replaced, ... - How to set maximum gpus per user?
Assuming we have a user named "bambam", the following will set the maximum number of GPUS that bambam can use to 6: USERCFG[bambam] MAXGRES[gpus]=6 ... - How can I make sure jobs run within a single infiniband switch?
Problem: MPI jobs should run within a single infiniband switch if at all possible, to maximize bandwidth between nodes. Solution: Using Moab nodesets, this can can be implemented ... - How can I give remote access to my cluster, without creating new accounts and sharing passwords?
Issue: I need to give remote access to a support engineer or a developer to troubleshoot an issue, but I don't want to create a ... - How to connect to a web based program on a compute node? I.E. Port Forwarding
Normally USERS do not have direct access to compute nodes. Assuming the USER's node has access to another node (say headnode) which then has access to computenodeXX, ... - How do NODEAVAILABILITYPOLICY and NODEALLOCATIONPOLICY work?
Here is a tutorial showing how some common settings of the following Moab parameters work: JOBNODEMATCHPOLICY EXACTNODE JOBNODEMATCHPOLICY AUTO NODEAVAILABILITYPOLICY DEDICATED:PROC NODEAVAILABILITYPOLICY COMBINED:PROC NODEALLOCATIONPOLICY CPULOAD For all the ... - How can I oversubscribe processors?
Question: How can I oversubscribe processors? Answer: The information presented here applies to Moab 9.1.1/9.1.2 and Torque 6.1.1.1/6.1.2. One can enter more processors in the nodes table than ... - How do I use a submit filter to enforce submission rules?
Issue: Need to force users to follow site rules.Affected Version: All versions of Moab (some variations in releases)Symptom: A site has job submission requirements that ... - How do we calculate Average Utilized Procs and Average Utilized Memory in showstats?
Average Utilized Procs is calculated as:J->PSUtilized / J->AWallTime Average Utilized Memory is calculated as:J->MSUtilized / J->AWallTime Internal definitions: double PSUtilized; /**< procseconds utilized by job */ double ... - How can I get debug information without a core file?
Issue: Not all sites generate core files, yet source code information is neede by developers. Solution: Adaptive Computing has a script that will get the relivant binary offsets ... - How can I get Torque to recognize additional memory added to a node?
Issue: Torque caches a lot of node information, including memory. After adding memory to a node, the server may not see that additional memory. Solution: There ... - How do I script job submissions with dependencies?
Issue: When scripting the submission of jobs, the job ID output by Moab will contain a newline character, which will cause problems if that output ... - How do I use the support-diag.py script to upload site information?
Problem: When Adaptive Computing needs information to diagnose a problem, it's important support and development personnel have configuration and logs available to investigate the problem(s). This ... - How can I temporarily elevate log levels when requested?
There are three different types of logs that you may be asked to provide with elevated log levels: Moab, Torque server, and the compute node ... - How do I change the viewpoint-admin password?
The viewpoint-admin user is the out-of-the-box administrative user that allows you to log into Viewpoint to set up everything else. To change it run the ... - How can I query the Insight Mongo database using SQL?
This article will help you write SQL queries against the INSIGHT mongo database using Apache Drill. Before you start you need to connect Viewpoint reports ... - How do I encrypt the connection between MWS and Viewpoint?
Tested on Centos 7.6 using Tomcat 7.0, MWS 9.1.3, Viewpoint 9.1.3, Java 1.8.0 Generate a self signed certificate and private keyFirst generate a certificate and private ... - How to backup and restore the Insight database
Insight database backups and restorations can be performed using mongodump and mongorestore. These should have automatically been installed when you installed the mongo database. Creating ... - How do I set up MWS to authenticate to PAM
Here are steps that explain how to switch MWS from using LDAP authentication to PAM authentication.First stop Tomcat #############################[root]# systemctl stop tomcat #############################Then figure out where the ... - How do I test Viewpoint with a Postgres 9.6 database without disturbing my production database?
This document will show you how to test a Postgres database upgrade for Viewpoint. It will explain how to install Postgres on a separate host ... - How do I test MAM with a Postgres 9.6 database without disturbing my production database?
This document will show you how to test a Postgres database upgrade for Moab Accounting Manager (MAM). It will explain how to install Postgres on ... - How can I prevent users who forget to log out of interactive sessions from tying up resources?
Issue: Interactive users sometimes forget to log out. The resources they requested for their jobs are tied up until they discover they've done that, or ... - How do I increase the stack size for my jobs?
Issue: The stack size for compute jobs may be limited to the system's default stack size, which is sometimes inadequate for a job. Since the ... - How do I use checkjob to diagnose job issues?
Issue: I’m trying to understand why my job is not running. The checkjob output can be confusing. Affected Versions: All Solution: The best way to analyze why ...