Problem: StarCCM jobs fail to start with "Cannot initialize RDMA protocol"
Affected versions: All
Symptons:
Full error
Starting local server: /cd-adapco/STAR-CCM+9.04.009/star/bin/starccm+ -fabric IBV -np 24 -machinefile /eecfd/ccairoli/ClusterTest/Test_1Blades_03/host.txt -rsh "ssh" -server /eecfd/ccairoli/ClusterTest/Test_1Blades_03/KCS_D10p8_V10p0.sim
Starting STAR-CCM+ parallel server
starccm+: Rank 0:5: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:5: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:5: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:19: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:19: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:19: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:4: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:4: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:4: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:13: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:13: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:13: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:6: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:6: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:6: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:16: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:16: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:16: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:17: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:17: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:17: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:21: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:21: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:21: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:3: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:3: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:3: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:18: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:18: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:18: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:2: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:2: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:2: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:7: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:7: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:7: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:9: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:9: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:9: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:10: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:10: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:10: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:15: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:15: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:15: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:11: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:11: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:11: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:1: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:1: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:23: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:1: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:23: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:23: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:20: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:20: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:20: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:12: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:12: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:12: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:0: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:0: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:0: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:22: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:22: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:22: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:8: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:8: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:8: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:14: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:14: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:14: MPI_Init: Internal Error: Cannot initialize RDMA protocol
MPI Application rank 5 exited before MPI_Init() with status 1
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
mpirun: Broken pipe
error: Server process ended unexpectedly (return code 1)
starccm+: Rank 0:5: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:5: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:5: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:19: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:19: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:19: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:4: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:4: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:4: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:13: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:13: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:13: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:6: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:6: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:6: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:16: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:16: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:16: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:17: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:17: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:17: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:21: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:21: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:21: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:3: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:3: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:3: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:18: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:18: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:18: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:2: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:2: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:2: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:7: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:7: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:7: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:9: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:9: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:9: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:10: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:10: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:10: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:15: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:15: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:15: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:11: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:11: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:11: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:1: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:1: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:23: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:1: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:23: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:23: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:20: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:20: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:20: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:12: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:12: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:12: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:0: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:0: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:0: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:22: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:22: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:22: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:8: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:8: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:8: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:14: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:14: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:14: MPI_Init: Internal Error: Cannot initialize RDMA protocol
MPI Application rank 5 exited before MPI_Init() with status 1
mpirun: Broken pipe
Solution: When pbs_mom starts, it is limited to "max locked memory (kbytes, -l) 64"
You can add a "ulimit -l unlimited" to the pbs_mom init script. Make sure to add this limit before starting pbs_mom, as it will inherit the limit of the shell it starts from.
Tags: mpi, RDMA, startccm, ulimit