Why do my Moab/SLURM Jobs report "Invalid job credential" with srun?


Issue: Job that are launched with srun fail with:

[jbooth@support-slurm ~]$ srun -N2 -l -t 30 /bin/hostname
srun: Job is in held state, pending scheduler release
srun: job 500007 queued and waiting for resources
srun: job 500007 has been allocated resources
srun: error: Task launch for 500007.0 failed on node support-sn1: Invalid job credential
srun: error: Task launch for 500007.0 failed on node support-sn2: Invalid job credential
srun: error: Application launch failed: Invalid job credential
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: Timed out waiting for job step to complete

Symptom: Jobs will start and then fail on "Invalid job credential".

Affeted Version: All

Solution:

You will need to setup ssh keys across the cluster to allow access to the compute nodes.

See JobCredentialPrivateKey and JobCredentialPublicCertificate in the man page for slurm.conf

 

JobCredentialPrivateKey
Fully qualified pathname of a file containing a private key used for authentication by Slurm
daemons. This parameter is ignored if CryptoType=crypto/munge.

JobCredentialPublicCertificate
Fully qualified pathname of a file containing a public key used for authentication by Slurm
daemons. This parameter is ignored if CryptoType=crypto/munge.

 

If you are using "CryptoType=crypto/munge", make sure that munged is running on all the compute nodes and that munge.key is the same across the cluster.

Tags: Invalid job credential, srun
Last update:
2017-02-24 01:31
Author:
Jason Booth
Revision:
1.3
Average rating:0 (0 Votes)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.

Records in this category

Tags