X11 Forwarding Failing on qsub submission


PROBLEM

We've had X11 forwarding for interactive jobs started with the -X option to qsub working for a long time. Now it is failing with the following error messages; 

Failed to allocate internet-domain X11 display socket. 
PBS: X11 forwarding init failed 

I've confirmed that ssh -X works properly between the login nodes and the compute nodes, it's only failing when trying to do so via qsub -IX. 

I've checked that the files /var/spool/torque/torque.cfg and /var/spool/torque/mom_priv/config have not changed. 

RESOLUTION


While going over strace output from the pbs_mom I was able to determine that pbs_mom was trying to open the interface as IPv6 but since it had been disabled on the cluster systems that obviously fails. Rather than then trying to use IPv4 pbs_mom was aborting the attempt to open the X11 display channel and merely printed the message that it failed to allocate the socket. 

Since IPv6 had only been turned off using sysctl the actual kernel stack was still loaded. I was able to determine 2 possible solutions. To mitigate this problem for live systems you can do the following; 

sysctl -w net.ipv6.conf.lo.disable_ipv6=0 

This turns IPv6 on just for the loop back interface which is sufficient to allow the X11 display processing to continue on and finally use IPv4. 

The second option is to add the file /etc/modprobe.d/ipv6.conf with the following contents and then reboot; 

options ipv6 disable=1 

This prevents the IPv6 kernel stack from loading at boot which also allows pbs_mom to use only IPv4. 


Last update:
2018-04-12 23:00
Author:
Shawn Hoopes
Revision:
1.0
Average rating:0 (0 Votes)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.

Records in this category

Tags