Initializing YSQL Failed to Initialize client: Timed out

Hi, I was trying to run yugabyte latest version with YSQL enabled on a cluster of three nodes using docker containers.

It does not initialize correctly and gave me this error in the yb-tserver.ERROR:

E1017 22:17:37.732748  4689 async_initializer.cc:83] Failed to initialize client: Timed out (yb/rpc/rpc.cc:199): Could not locate the leader master: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 97) passed its deadline 15497.054s (passed: 5.067s): Not found (yb/master/master_rpc.cc:279): no leader found: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 1)
E1017 22:17:37.880820  4727 async_initializer.cc:83] Failed to initialize client: Timed out (yb/rpc/rpc.cc:199): Could not locate the leader master: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 98) passed its deadline 15497.185s (passed: 5.084s): Not found (yb/master/master_rpc.cc:279): no leader found: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 1)
E1017 22:17:43.834643  4689 async_initializer.cc:83] Failed to initialize client: Timed out (yb/rpc/rpc.cc:199): Could not locate the leader master: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 98) passed its deadline 15503.149s (passed: 5.074s): Not found (yb/master/master_rpc.cc:279): no leader found: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 1)
E1017 22:17:44.014653  4727 async_initializer.cc:83] Failed to initialize client: Timed out (yb/rpc/rpc.cc:199): Could not locate the leader master: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 98) passed its deadline 15503.334s (passed: 5.069s): Not found (yb/master/master_rpc.cc:279): no leader found: GetLeaderMasterRpc(addrs: [172.25.1.70:7100, 172.25.1.71:7100, 172.25.1.72:7100], num_attempts: 1)

I also found some helpful information in yb-master.INFO

I1017 22:23:11.237989  4742 webserver.cc:278] Webserver: error reading: Resource temporarily unavailable
I1017 22:24:39.529036  4773 webserver.cc:278] Webserver: error reading: Resource temporarily unavailable
I1017 22:29:39.444651  4777 webserver.cc:278] Webserver: error reading: Resource temporarily unavailable
I1017 22:33:21.795174  4786 webserver.cc:278] Webserver: error reading: Resource temporarily unavailable
I1017 22:35:42.891626  4788 webserver.cc:278] Webserver: error reading: Resource temporarily unavailable

where I cannot access the admin UI through port 7000 or 13000.

Is the cause of this problem that it stoped searching after a few time out or it is the config file that was wrong?

If it is stop searching after a few timeout, what should I do to get it run on my docker containers where I cant simply initiate them all at the same time?

For some extra information, here is my config file:
master:

--master_addresses=172.25.1.70:7100,172.25.1.71:7100,172.25.1.72:7100
--rpc_bind_addresses=172.25.1.70
--fs_data_dirs=/opt/yugabyteDB/data
--replication_factor=3
--webserver_interface=172.25.1.70

tserver:

--tserver_master_addrs=172.25.1.70:7100,172.25.1.71:7100,172.25.1.72:7100
--rpc_bind_addresses=172.25.1.70
--cql_proxy_bind_address=172.25.1.70:9042
--fs_data_dirs=/opt/yugabyteDB/data
--webserver_interface=172.25.1.70
--enable_ysql=true
--pgsql_proxy_bind_address=172.25.1.70:5433

Thank you!

Update:

After the time frame that yb-tserver.ERROR reports the error, this has been seen in yb-tserver.INFO:

W1017 22:17:45.590307  4695 heartbeater.cc:598] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Failed to heartbeat to 172.25.1.71:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=10, num=3, masters=0x00000000024be7f0 -> [[172.25.1.70:7100], [172.25.1.71:7100], [172.25.1.72:7100]], code=Service unavailable
I1017 22:17:46.500589  4686 tcp_stream.cc:293] { local: 172.25.1.70:42800 remote: 172.25.1.72:7100 }:  Recv failed: Network error (yb/util/net/socket.cc:538): recvmsg error: Connection refused (system error 111)
I1017 22:17:47.063611  4687 tcp_stream.cc:293] { local: 172.25.1.70:46714 remote: 172.25.1.72:7100 }:  Recv failed: Network error (yb/util/net/socket.cc:538): recvmsg error: Connection refused (system error 111)
W1017 22:17:47.096071  4701 master_rpc.cc:274] More than 500 ms has passed, choosing to heartbeat to follower master c1d2a60707e44234b41b8aca32ba17f3 after 28 iterations of all masters.
I1017 22:17:47.096638  4695 heartbeater.cc:301] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Connected to a leader master server at 172.25.1.70:7100
I1017 22:17:47.096664  4695 heartbeater.cc:359] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Registering TS with master...
I1017 22:17:47.096681  4695 server_base.cc:477] Using private ip address 172.25.1.70
I1017 22:17:47.096699  4695 heartbeater.cc:368] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Sending a full tablet report to master...
W1017 22:17:47.097321  4695 heartbeater.cc:598] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Failed to heartbeat to 172.25.1.70:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=11, num=3, masters=0x00000000024be7f0 -> [[172.25.1.70:7100], [172.25.1.71:7100], [172.25.1.72:7100]], code=Service unavailable
I1017 22:17:48.214797  4684 tcp_stream.cc:293] { local: 172.25.1.70:53582 remote: 172.25.1.72:7100 }:  Recv failed: Network error (yb/util/net/socket.cc:538): recvmsg error: Connection refused (system error 111)
W1017 22:17:48.617285  4701 master_rpc.cc:274] More than 500 ms has passed, choosing to heartbeat to follower master c1d2a60707e44234b41b8aca32ba17f3 after 28 iterations of all masters.
I1017 22:17:48.628082  4695 heartbeater.cc:301] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Connected to a leader master server at 172.25.1.70:7100
I1017 22:17:48.642170  4695 heartbeater.cc:359] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Registering TS with master...
I1017 22:17:48.642196  4695 server_base.cc:477] Using private ip address 172.25.1.70
I1017 22:17:48.642226  4695 heartbeater.cc:368] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Sending a full tablet report to master...
W1017 22:17:48.644371  4695 heartbeater.cc:598] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Failed to heartbeat to 172.25.1.70:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=12, num=3, masters=0x00000000024be7f0 -> [[172.25.1.70:7100], [172.25.1.71:7100], [172.25.1.72:7100]], code=Service unavailable
I1017 22:17:49.611989  4727 async_initializer.cc:77] Successfully built ybclient
I1017 22:17:49.640787  4689 async_initializer.cc:77] Successfully built ybclient
I1017 22:17:49.647994  4695 heartbeater.cc:301] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Connected to a leader master server at 172.25.1.71:7100
I1017 22:17:49.648052  4695 heartbeater.cc:359] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Registering TS with master...
I1017 22:17:49.648067  4695 server_base.cc:477] Using private ip address 172.25.1.70
I1017 22:17:49.649050  4695 heartbeater.cc:368] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Sending a full tablet report to master...
W1017 22:17:49.649288  4695 heartbeater.cc:598] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Failed to heartbeat to 172.25.1.71:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=13, num=3, masters=0x00000000024be7f0 -> [[172.25.1.70:7100], [172.25.1.71:7100], [172.25.1.72:7100]], code=Service unavailable
I1017 22:17:49.656942  4690 async_initializer.cc:77] Successfully built ybclient
I1017 22:17:50.651070  4695 heartbeater.cc:301] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Connected to a leader master server at 172.25.1.71:7100
I1017 22:17:50.655741  4695 heartbeater.cc:359] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Registering TS with master...
I1017 22:17:50.655763  4695 server_base.cc:477] Using private ip address 172.25.1.70
I1017 22:17:50.655781  4695 heartbeater.cc:368] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Sending a full tablet report to master...
W1017 22:17:50.656415  4695 heartbeater.cc:598] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Failed to heartbeat to 172.25.1.71:7100: Service unavailable (yb/tserver/heartbeater.cc:479): master is no longer the leader tries=14, num=3, masters=0x00000000024be7f0 -> [[172.25.1.70:7100], [172.25.1.71:7100], [172.25.1.72:7100]], code=Service unavailable
I1017 22:17:51.658004  4695 heartbeater.cc:301] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Connected to a leader master server at 172.25.1.71:7100
I1017 22:17:51.658041  4695 heartbeater.cc:359] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Registering TS with master...
I1017 22:17:51.658056  4695 server_base.cc:477] Using private ip address 172.25.1.70
I1017 22:17:51.658072  4695 heartbeater.cc:368] P 1ce0d8ac1b94423bb5db3ce49b9dba01: Sending a full tablet report to master...

It looks like the yb-tserver is able to connect to a master, but the admin ui is still not working.

Runnig ./ysqlsh -h 172.25.1.70 :

ysqlsh: FATAL: Not found: Error loading table with oid 1260 in database with oid 1: The object does not exist: table_id: "000000010000300080000000000004ec"

Hi @AndrewLiuRM,

Can you use enable_ysql=true while starting yb-master here? Your config file for master will be (based on your example above):

--master_addresses=172.25.1.70:7100,172.25.1.71:7100,172.25.1.72:7100
--rpc_bind_addresses=172.25.1.70
--fs_data_dirs=/opt/yugabyteDB/data
--replication_factor=3
--webserver_interface=172.25.1.70
--enable_ysql=true

Hi @neha,

Thank you for your reply! ysqlsh works now! I have two follow up questions:

  1. should I still keep the --enable_ysql=true in yb-tserver confiig file?

  2. the admin UI is not working for me rn. The 7000 port takes forever to respond. I had ran into similar issues and got fixed by adding webserver_interface property to the config file. Is there any other properties to add to the config file to solve this issue?

Thank you!

Great to hear that ysqlsh works now!
Yes, please keep --enable_ysql=true in yb-tserver as well. As a side note, from version 2.0.2 onwards, ysql will be enabled by default. So, you won’t need these to be explicitly specified anymore.

@bogdan will be a better person to help with port 7000 issue.

Thank you! I’ll talk to @bogdan and I’ll update with the solution here!