Tserver processes not registering with master processes

Hi

I tried to deploy YugabyteDB cluster based on instructions from https://docs.yugabyte.com/latest/deploy/public-clouds/aws/manual-deployment/ . My problem is that tserver processes are not registering with master processes. From tserver logs I can see that tserver process is trying to register to address 127.0.0.1:7100.
My masters are on 10.10.10.27 and 10.20.10.27.
tserver process is started with: /opt/yugabyte-2.1.5.0/bin/yb-tserver --flagfile /opt/yugabyte-2.1.5.0/etc/tserver.conf
tserver config file (/opt/yugabyte-2.1.5.0/etc/tserver.conf) looks like:

--master_addresses=10.10.10.27:7100,10.20.10.27:7100
--fs_data_dirs=/var/yugabytedb
#--rpc_bind_addresses=10.10.10.28:9100
--rpc_bind_addresses=0.0.0.0:9101
--cql_proxy_bind_address=10.10.10.28:9042
--redis_proxy_bind_address=10.10.10.28:6379
--webserver_interface=10.10.10.28
--pgsql_proxy_bind_address=10.10.10.28:5433
--placement_cloud=ZTM
--placement_region=HR
--placement_zone=Planinska

log entry looks like:

W0508 06:06:53.817953 30750 heartbeater.cc:519] P 0e4d0cc02731465fa972d4206ed66ed4: Failed to heartbeat to 127.0.0.1:7100: Network error (yb/util/net/socket.cc:537): Failed to ping master at 127.0.0.1:7100: recvmsg error: Connection refused (system error 111) tries=62180, num=1, masters=0x0000000001a78340 -> [[127.0.0.1:7100]], code=Network error

Is there something wrong with my setup?

YugabyteDB uses Raft distributed consensus for both data replication and leader election (on a per-shard basis). This means that the replication factor of the cluster has to be an odd number like 3 or 5. And the number of yb-master servers has to match this replication factor since that is how the yb-master service is made continuously available.

In your conf file above, you have only 2 yb-master addresses. That is an invalid configuration. The AWS manual deployment doc and the regular manual deployment doc (https://docs.yugabyte.com/latest/deploy/manual-deployment/) highlight this aspect along with the steps. Hoping you can try setting up your cluster as per the documented steps and let us know on this thread how it goes.

Changed setup to 3 masters but the problem is the same.

New master config:

--master_addresses=10.10.10.27:7100,10.20.10.27:7100,10.20.10.29:7100
--fs_data_dirs=/var/yugabytedb
--placement_cloud=ZTM
--placement_region=HR
--placement_zone=Planinska

new tserver config:

--master_addresses=10.10.10.27:7100,10.20.10.27:7100,10.20.10.29:7100
--fs_data_dirs=/var/yugabytedb
--rpc_bind_addresses=0.0.0.0:9101
--cql_proxy_bind_address=10.10.10.28:9042
--redis_proxy_bind_address=10.10.10.28:6379
--webserver_interface=10.10.10.28
--pgsql_proxy_bind_address=10.10.10.28:5433
--placement_cloud=ZTM
--placement_region=HR
--placement_zone=Planinska

TServer log:

W0511 05:14:31.070868 80550 heartbeater.cc:519] P c7fc711e5a7b4f2eb19d3b483eebc743: Failed to heartbeat to 127.0.0.1:7100: Network error (yb/util/net/socket.cc:537): Failed to ping master at 127.0.0.1:7100: recvmsg error: Connection refused (system error 111) tries=238135, num=1, masters=0x000000000272e310 -> [[127.0.0.1:7100]], code=Network error

As the documentation states, the above should be <host-ip>:9100 so that the servers can find each other correctly. And you should provide a similar value of <host-ip>:7100 for the yb-master servers. Any reason you are changing these values?

Also, I assume you are running a 3 node cluster where every host has 1 yb-tserver and 1 yb-master. But I see the following where yb-tserver is running on a different host. Can you pls confirm if this is a typo?

yb-master hosts
10.10.10.27
10.20.10.27
10.20.10.29

yb-tserver hosts
10.10.10.28
?
?

@sid.choudhury and @puska : it would be good to know the command line with which the masters are being brought up.

For both master and tserver, here are the guidelines regarding the use of rpc_bind_addresses. Either of the two modes below should be used:

  1. Set --rpc_bind_addresses to the ip of the host. So yb-master’s command line would include --rpc_bind_addresses=10.10.10.27:7100 and a yb-tserver would include –rpc_bind_addresses=10.10.10.28:9100
  2. If –rpc_bind_addresses is either not set at all or being set to 0.0.0.0, the parameter --server_broadcast_addresses should be set to the host ip. For a yb-master, that would include something like --server_broadcast_addresses=10.10.10.27:7100 and for the yb-tserver it would include something like –server_broadcast_addresses=10.10.10.28:9100. This allows the servers to advertise the right ip on which they can be reached.

This is documented at https://docs.yugabyte.com/latest/reference/configuration/yb-tserver/#rpc-bind-addresses

I found the problem. I should have used option --tserver_master_addrs instead of --master_addresses in tserver.conf.
Working configuration looks like:

master.conf:

--master_addresses=10.10.10.27:7100,10.20.10.27:7100,10.20.10.29:7100
--fs_data_dirs=/var/yugabytedb
--server_broadcast_addresses=10.10.10.27:7100
--placement_cloud=ZTM
--placement_region=HR
--placement_zone=Planinska

tserver.conf:

--tserver_master_addrs=10.10.10.27:7100,10.20.10.27:7100,10.20.10.29:7100
--fs_data_dirs=/var/yugabytedb
--rpc_bind_addresses=0.0.0.0:9101
--server_broadcast_addresses=10.10.10.27:9101
--use_private_ip=never
--cql_proxy_bind_address=10.10.10.27:9042
--redis_proxy_bind_address=10.10.10.27:6379
--webserver_port=9000
--pgsql_proxy_bind_address=10.10.10.27:5433
--placement_cloud=ZTM
--placement_region=HR
--placement_zone=Planinska

Thanks for your help.