Starccm jobs fail "Cannot initialize RDMA protocol"


 

Problem: StarCCM jobs fail to start with "Cannot initialize RDMA protocol"

 

Affected versions: All

 

Symptons:

Full error

Starting local server: /cd-adapco/STAR-CCM+9.04.009/star/bin/starccm+ -fabric IBV -np 24 -machinefile /eecfd/ccairoli/ClusterTest/Test_1Blades_03/host.txt -rsh "ssh" -server /eecfd/ccairoli/ClusterTest/Test_1Blades_03/KCS_D10p8_V10p0.sim
Starting STAR-CCM+ parallel server
starccm+: Rank 0:5: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:5: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:5: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:19: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:19: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:19: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:4: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:4: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:4: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:13: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:13: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:13: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:6: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:6: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:6: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:16: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:16: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:16: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:17: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:17: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:17: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:21: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:21: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:21: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:3: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:3: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:3: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:18: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:18: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:18: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:2: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:2: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:2: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:7: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:7: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:7: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:9: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:9: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:9: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:10: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:10: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:10: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:15: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:15: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:15: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:11: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:11: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:11: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:1: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:1: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:23: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:1: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:23: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:23: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:20: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:20: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:20: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:12: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:12: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:12: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:0: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:0: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:0: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:22: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:22: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:22: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:8: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:8: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:8: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:14: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:14: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:14: MPI_Init: Internal Error: Cannot initialize RDMA protocol
MPI Application rank 5 exited before MPI_Init() with status 1
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
starccm+ Terminated
mpirun: Broken pipe
error: Server process ended unexpectedly (return code 1)
starccm+: Rank 0:5: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:5: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:5: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:19: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:19: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:19: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:4: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:4: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:4: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:13: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:13: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:13: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:6: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:6: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:6: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:16: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:16: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:16: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:17: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:17: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:17: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:21: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:21: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:21: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:3: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:3: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:3: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:18: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:18: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:18: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:2: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:2: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:2: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:7: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:7: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:7: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:9: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:9: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:9: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:10: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:10: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:10: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:15: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:15: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:15: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:11: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:11: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:11: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:1: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:1: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:23: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:1: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:23: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:23: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:20: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:20: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:20: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:12: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:12: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:12: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:0: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:0: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:0: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:22: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:22: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:22: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:8: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:8: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:8: MPI_Init: Internal Error: Cannot initialize RDMA protocol
starccm+: Rank 0:14: MPI_Init: ibv_create_cq() failed
starccm+: Rank 0:14: MPI_Init: Can't initialize RDMA device
starccm+: Rank 0:14: MPI_Init: Internal Error: Cannot initialize RDMA protocol
MPI Application rank 5 exited before MPI_Init() with status 1
mpirun: Broken pipe

Solution: When pbs_mom starts, it is limited to "max locked memory       (kbytes, -l) 64"

You can add a "ulimit -l unlimited" to the pbs_mom init script. Make sure to add this limit before starting pbs_mom, as it will inherit the limit of the shell it starts from.

 

Tags: mpi, RDMA, startccm, ulimit
Last update:
2017-02-24 01:39
Author:
Jason Booth
Revision:
1.2
Average rating:0 (0 Votes)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.

Records in this category

Tags