Hi, I was trying to run yugabyte latest version with YSQL enabled on a cluster of three nodes using docker containers.
It does not initialize correctly and gave me this error in the yb-tserver.ERROR:
E1017 22:17:37.732748 4689 async_initializer.cc:83] Failed to initialize client: Timed out (yb/rpc/rpc.cc:199): Could not locate the leader master: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 97) passed its deadline 15497.054s (passed: 5.067s): Not found (yb/master/master_rpc.cc:279): no leader found: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 1)
E1017 22:17:37.880820 4727 async_initializer.cc:83] Failed to initialize client: Timed out (yb/rpc/rpc.cc:199): Could not locate the leader master: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 98) passed its deadline 15497.185s (passed: 5.084s): Not found (yb/master/master_rpc.cc:279): no leader found: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 1)
E1017 22:17:43.834643 4689 async_initializer.cc:83] Failed to initialize client: Timed out (yb/rpc/rpc.cc:199): Could not locate the leader master: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 98) passed its deadline 15503.149s (passed: 5.074s): Not found (yb/master/master_rpc.cc:279): no leader found: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 1)
E1017 22:17:44.014653 4727 async_initializer.cc:83] Failed to initialize client: Timed out (yb/rpc/rpc.cc:199): Could not locate the leader master: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 98) passed its deadline 15503.334s (passed: 5.069s): Not found (yb/master/master_rpc.cc:279): no leader found: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 1)
I also found some helpful information in yb-master.INFO
I1017 22:23:11.237989 4742 webserver.cc:278] Webserver: error reading: Resource temporarily unavailable
I1017 22:24:39.529036 4773 webserver.cc:278] Webserver: error reading: Resource temporarily unavailable
I1017 22:29:39.444651 4777 webserver.cc:278] Webserver: error reading: Resource temporarily unavailable
I1017 22:33:21.795174 4786 webserver.cc:278] Webserver: error reading: Resource temporarily unavailable
I1017 22:35:42.891626 4788 webserver.cc:278] Webserver: error reading: Resource temporarily unavailable
where I cannot access the admin UI through port 7000 or 13000.
Is the cause of this problem that it stoped searching after a few time out or it is the config file that was wrong?
If it is stop searching after a few timeout, what should I do to get it run on my docker containers where I cant simply initiate them all at the same time?
For some extra information, here is my config file:
master:
--master_addresses=172.25.1.70:7100,172.25.1.71:7100,172.25.1.72:7100
--rpc_bind_addresses=172.25.1.70
--fs_data_dirs=/opt/yugabyteDB/data
--replication_factor=3
--webserver_interface=172.25.1.70
tserver:
--tserver_master_addrs=172.25.1.70:7100,172.25.1.71:7100,172.25.1.72:7100
--rpc_bind_addresses=172.25.1.70
--cql_proxy_bind_address=172.25.1.70:9042
--fs_data_dirs=/opt/yugabyteDB/data
--webserver_interface=172.25.1.70
--enable_ysql=true
--pgsql_proxy_bind_address=172.25.1.70:5433
Thank you!
Update:
After the time frame that yb-tserver.ERROR reports the error, this has been seen in yb-tserver.INFO:
W1017 22:17:45.590307 4695 heartbeater.cc:598] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Failed to heartbeat to 172.25.1.71:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=10, num=3, masters=0x00000000024be7f0 -> [[172.25.1.70:7100], [172.25.1.71:7100], [172.25.1.72:7100]], code=Service unavailable
I1017 22:17:46.500589 4686 tcp_stream.cc:293] { local: 172.25.1.70:42800 remote: 172.25.1.72:7100 }: Recv failed: Network error (yb/util/net/socket.cc:538): recvmsg error: Connection refused (system error 111)
I1017 22:17:47.063611 4687 tcp_stream.cc:293] { local: 172.25.1.70:46714 remote: 172.25.1.72:7100 }: Recv failed: Network error (yb/util/net/socket.cc:538): recvmsg error: Connection refused (system error 111)
W1017 22:17:47.096071 4701 master_rpc.cc:274] More than 500 ms has passed, choosing to heartbeat to follower master c1d2a60707e44234b41b8aca32ba17f3 after 28 iterations of all masters.
I1017 22:17:47.096638 4695 heartbeater.cc:301] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Connected to a leader master server at 172.25.1.70:7100
I1017 22:17:47.096664 4695 heartbeater.cc:359] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Registering TS with master...
I1017 22:17:47.096681 4695 server_base.cc:477] Using private ip address 172.25.1.70
I1017 22:17:47.096699 4695 heartbeater.cc:368] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Sending a full tablet report to master...
W1017 22:17:47.097321 4695 heartbeater.cc:598] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Failed to heartbeat to 172.25.1.70:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=11, num=3, masters=0x00000000024be7f0 -> [[172.25.1.70:7100], [172.25.1.71:7100], [172.25.1.72:7100]], code=Service unavailable
I1017 22:17:48.214797 4684 tcp_stream.cc:293] { local: 172.25.1.70:53582 remote: 172.25.1.72:7100 }: Recv failed: Network error (yb/util/net/socket.cc:538): recvmsg error: Connection refused (system error 111)
W1017 22:17:48.617285 4701 master_rpc.cc:274] More than 500 ms has passed, choosing to heartbeat to follower master c1d2a60707e44234b41b8aca32ba17f3 after 28 iterations of all masters.
I1017 22:17:48.628082 4695 heartbeater.cc:301] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Connected to a leader master server at 172.25.1.70:7100
I1017 22:17:48.642170 4695 heartbeater.cc:359] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Registering TS with master...
I1017 22:17:48.642196 4695 server_base.cc:477] Using private ip address 172.25.1.70
I1017 22:17:48.642226 4695 heartbeater.cc:368] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Sending a full tablet report to master...
W1017 22:17:48.644371 4695 heartbeater.cc:598] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Failed to heartbeat to 172.25.1.70:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=12, num=3, masters=0x00000000024be7f0 -> [[172.25.1.70:7100], [172.25.1.71:7100], [172.25.1.72:7100]], code=Service unavailable
I1017 22:17:49.611989 4727 async_initializer.cc:77] Successfully built ybclient
I1017 22:17:49.640787 4689 async_initializer.cc:77] Successfully built ybclient
I1017 22:17:49.647994 4695 heartbeater.cc:301] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Connected to a leader master server at 172.25.1.71:7100
I1017 22:17:49.648052 4695 heartbeater.cc:359] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Registering TS with master...
I1017 22:17:49.648067 4695 server_base.cc:477] Using private ip address 172.25.1.70
I1017 22:17:49.649050 4695 heartbeater.cc:368] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Sending a full tablet report to master...
W1017 22:17:49.649288 4695 heartbeater.cc:598] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Failed to heartbeat to 172.25.1.71:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=13, num=3, masters=0x00000000024be7f0 -> [[172.25.1.70:7100], [172.25.1.71:7100], [172.25.1.72:7100]], code=Service unavailable
I1017 22:17:49.656942 4690 async_initializer.cc:77] Successfully built ybclient
I1017 22:17:50.651070 4695 heartbeater.cc:301] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Connected to a leader master server at 172.25.1.71:7100
I1017 22:17:50.655741 4695 heartbeater.cc:359] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Registering TS with master...
I1017 22:17:50.655763 4695 server_base.cc:477] Using private ip address 172.25.1.70
I1017 22:17:50.655781 4695 heartbeater.cc:368] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Sending a full tablet report to master...
W1017 22:17:50.656415 4695 heartbeater.cc:598] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Failed to heartbeat to 172.25.1.71:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=14, num=3, masters=0x00000000024be7f0 -> [[172.25.1.70:7100], [172.25.1.71:7100], [172.25.1.72:7100]], code=Service unavailable
I1017 22:17:51.658004 4695 heartbeater.cc:301] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Connected to a leader master server at 172.25.1.71:7100
I1017 22:17:51.658041 4695 heartbeater.cc:359] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Registering TS with master...
I1017 22:17:51.658056 4695 server_base.cc:477] Using private ip address 172.25.1.70
I1017 22:17:51.658072 4695 heartbeater.cc:368] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Sending a full tablet report to master...
It looks like the yb-tserver is able to connect to a master, but the admin ui is still not working.
Runnig ./ysqlsh -h 172.25.1.70 :
ysqlsh: FATAL: Not found: Error loading table with oid 1260 in database with oid 1: The object does not exist: table_id: "000000010000300080000000000004ec"