Network error (yb/util/net/socket.cc:575): recvmsg error: Connection refused (error 111)

Hi,

I did installation of Yugabyte 1.1.12.0. My landscape contain 12 servers from different IDC, 3 servers from each IDC.

Its a cluster server out of 12 servers, 3 are masters, one from each IDC and all other servers are tablets server including master.

It works fine near about 12 hours after that all yugabyte process were stopped due to below given error line

Error getting permanent uuid from config peer [10.141.91.35:7100]: Network error (yb/util/net/socket.cc:575): recvmsg error: Connection refused (error 111)

Nothing is accessible. I did not find any yugabyte process for master or tablet.
Is this network error?

I tried to get root cause from server logs but was unable to find main cause.

W0220 20:26:58.281448 68267 consensus_peers.cc:607] Error getting permanent uuid from config peer [10.141.91.35:7100]: Network error (yb/util/net/socket.cc:575): recvmsg error: Connection refused (error 111)
I0220 20:27:08.281651 68267 consensus_peers.cc:620] Retrying to get permanent uuid for remote peer: [10.141.91.35:7100] attempt: 18
W0220 20:27:08.299177 68267 consensus_peers.cc:607] Error getting permanent uuid from config peer [10.141.91.35:7100]: Network error (yb/util/net/socket.cc:575): recvmsg error: Connection refused (error 111)
I0220 20:27:18.299412 68267 consensus_peers.cc:620] Retrying to get permanent uuid for remote peer: [10.141.91.35:7100] attempt: 19
W0220 20:27:18.315446 68267 consensus_peers.cc:607] Error getting permanent uuid from config peer [10.141.91.35:7100]: Network error (yb/util/net/socket.cc:575): recvmsg error: Connection refused (error 111)
I0220 20:27:28.315675 68267 consensus_peers.cc:620] Retrying to get permanent uuid for remote peer: [10.141.91.35:7100] attempt: 20
W0220 20:27:28.331583 68267 consensus_peers.cc:607] Error getting permanent uuid from config peer [10.141.91.35:7100]: Network error (yb/util/net/socket.cc:575): recvmsg error: Connection refused (error 111)
I0220 20:27:38.331809 68267 consensus_peers.cc:620] Retrying to get permanent uuid for remote peer: [10.141.91.35:7100] attempt: 21
W0220 20:27:38.349864 68267 consensus_peers.cc:607] Error getting permanent uuid from config peer [10.141.91.35:7100]: Network error (yb/util/net/socket.cc:575): recvmsg error: Connection refused (error 111)
I0220 20:27:48.350052 68267 consensus_peers.cc:620] Retrying to get permanent uuid for remote peer: [10.141.91.35:7100] attempt: 22
W0220 20:27:48.367148 68267 consensus_peers.cc:607] Error getting permanent uuid from config peer [10.141.91.35:7100]: Network error (yb/util/net/socket.cc:575):

Regards,
Sadesh Jayraj

Sadesh,
There should be a yb-master process running on 10.141.91.35. If there is one, then your problem most likely is firewall/security group related. A list of ports that need to be open on the network can be found here: Ports reference.
If there is no yb-master process on that particular host, then you should check the logs for the master process (usually located on the first directory specified by --fs_data_dirs under logs/master). A common reason that the yb-master process usually fails to start because it cannot bind on port 7000 or 7100.

–Alan

hi @sadesh

  1. Could you please open a github issue on this please here: Sign in to GitHub · GitHub (and if you could share all the gflags options passed to yb-master/yb-tserver it’ll be of help).

  2. Also, for live help or talk to our engineer team members, feel free to ping us on: Join yugabyte-db on Slack - Community Inviter

regards,
Kannan