So after some period of inactivity, my FC servers with infiniband cards in them seem to go to sleep or something. If I login to the machine and try to ping other machines with IB interfaces, the first request takes much longer than the others. This is not so over ethernet. Why? I'm just about out of ideas.
compute-4-12 ~]# ping ib-s0-1 (this is a solaris machine)
PING ib-s0-1.local (192.168.1.19) 56(84) bytes of data.
64 bytes from ib-s0-1.168.192.in-addr.arpa (192.168.1.19): icmp_req=1 ttl=255 time=3.64 ms
64 bytes from ib-s0-1.168.192.in-addr.arpa (192.168.1.19): icmp_req=2 ttl=255 time=0.131 ms
64 bytes from ib-s0-1.168.192.in-addr.arpa (192.168.1.19): icmp_req=3 ttl=255 time=0.222 ms
compute-4-08 ~]# ping c4-7 (identical neighboring FC machine)
PING compute-4-07.local.local (10.255.255.187) 56(84) bytes of data.
64 bytes from compute-4-07.local.255.10.in-addr.arpa (10.255.255.187): icmp_req=1 ttl=64 time=1.28 ms
64 bytes from compute-4-07.local.255.10.in-addr.arpa (10.255.255.187): icmp_req=2 ttl=64 time=0.108 ms
64 bytes from compute-4-07.local.255.10.in-addr.arpa (10.255.255.187): icmp_req=3 ttl=64 time=0.115 ms
64 bytes from compute-4-07.local.255.10.in-addr.arpa (10.255.255.187): icmp_req=4 ttl=64 time=0.119 ms
64 bytes from compute-4-07.local.255.10.in-addr.arpa (10.255.255.187): icmp_req=5 ttl=64 time=0.180 ms
This issue does not only affect ping. I created a user that mounts his home directory over nfs over infiniband. ssh'ing in takes longer the first time as well. If I ssh with verbosity turned up all the way, I get a brief hang at "we sent a hostbased packet, wait for reply" (this is the primary symptom I'm trying to get rid of since we want home directories over nfs over IB (without the initial delay))