Yugabyte Force Join a node to Master Leader

I have a 4-node Yugabyte cluster (version 2.20.5.0), where all services are started using yugabyted. The current status is as follows:

  • yugabyte02 is functioning as the master-leader, and the tserver is also running.
  • yugabyte01 is a master-follower, with its tserver running as expected.
  • yugabyte03 fails to start the master service due to the following error:

Failed to initialize client: Illegal state (yb/client/client-internal.cc:2622): Could not locate the leader master: Unable to determine master addresses

However, the tserver on this node is running.

  • yugabyte04 fails to start yugabyted services entirely, showing a “Bad Gateway” error in the yugabyted.log as follows: The below error I assume it tries to join yugabyte03 master-follower but it is down; I am aware the yugabyte04 node will run only tserver

[yugabyted start] 2025-05-07 02:11:44,518 ERROR: | 0.4s | HTTP error occurred while checking for security of leader master: HTTP Error 502: Bad Gateway

Upon investigating the yugabyted.conf files:

  • On yugabyte03, the join parameter references yugabyte05, which was decommissioned months ago. However, the current_masters field correctly lists: yugabyte01:port, yugabyte02:port, yugabyte03:port.
  • On yugabyte04, the join parameter is set to yugabyte03, and its current_masters field is identical to that of yugabyte03.

I’m trying to understand the following:

  1. Other than yugabyted.conf, where does Yugabyte store configuration data such that yugabyte03 still attempts to join the decommissioned yugabyte05?
  2. How can I force yugabyte03 to join yugabyte02 (the current master-leader) instead? any issues I’ll face doing so?

In Yugabyte, you can force a node to join the master leader by using the yb-admin tool with the force flag, but be cautious as this could affect cluster stability. I found it helpful to ensure the node is fully synced before forcing the join, as sometimes it can cause issues if the node is out of sync.

Hi Parvez,

Can you please send the yugabyted.conf files from leader master node (yugabyte02) and node yugabyte03. The conf file is in the <base-dir>/conf folder.

Thanks,
Nikhil

Hi @nmalladi, attaching ‘yugabyted.conf’ file for yugabyte02 and yugabyte03

yugabyte02 - yugabyted.conf

{
“data_dir”: “/storage/yugabyte_data”,
“log_dir”: “/var/log/yugabyte”,
“gen_certs_dir”: “/opt/yugabyte/yugabyte_base_dir/generated_certs”,
“master_rpc_port”: “7100”,
“tserver_rpc_port”: “9100”,
“master_webserver_port”: “7000”,
“tserver_webserver_port”: “9000”,
“ysql_port”: “5433”,
“ycql_port”: “9042”,
“advertise_address”: “yugabyte02”,
“webserver_port”: 7200,
“yugabyted_ui_port”: 15433,
“universe_uuid”: “e30e950b-80be-44b9-9e02-c3a3f53430f9”,
“node_uuid”: “f47c13c5-184b-4d5a-aebe-676725abe937”,
“tserver_uuid”: “b3f437411ff540ec9bd9e351c8d268e4”,
“master_uuid”: “311d038e4b624465b1689d9f1ddc82d7”,
“placement_uuid”: “3f22f39c-beb0-4876-a35c-cc1fe0db7ebb”,
“polling_interval”: “5”,
“callhome”: false,
“master_flags”: “flagfile=/etc/conf/yugabyte/master.conf”,
“tserver_flags”: “flagfile=/etc/conf/yugabyte/tserver.conf”,
“join”: “yugabyte05”,
“ysql_enable_auth”: true,
“use_cassandra_authentication”: true,
“cloud_provider”: “cloud1”,
“cloud_region”: “datacenter1”,
“cloud_zone”: “rack1”,
“fault_tolerance”: “none”,
“secure”: true,
“insecure”: false,
“certs_dir”: “/etc/conf/yugabyte/certs”,
“ca_cert_file_path”: “/etc/conf/yugabyte/certs/ca.crt”,
“database_password”: null,
“current_masters”: “yugabyte01:7100,yugabyte02:7100,yugabyte03:7100”,
“ui”: false,
“dns_enabled”: true,
“read_replica”: false,
“cluster_member”: true
}

yugabyte03: yugabyted.conf

{
“data_dir”: “/storage/yugabyte_data”,
“log_dir”: “/var/log/yugabyte”,
“gen_certs_dir”: “/opt/yugabyte/yugabyte_base_dir/generated_certs”,
“master_rpc_port”: “7100”,
“tserver_rpc_port”: “9100”,
“master_webserver_port”: “7000”,
“tserver_webserver_port”: “9000”,
“ysql_port”: “5433”,
“ycql_port”: “9042”,
“advertise_address”: “yugabyte03”,
“webserver_port”: 7200,
“yugabyted_ui_port”: 15433,
“universe_uuid”: “e30e950b-80be-44b9-9e02-c3a3f53430f9”,
“node_uuid”: “2bfaa2b2-54fe-473f-95f9-979eef1f54e5”,
“tserver_uuid”: “49f964ee71454a0a8fdbeb4af921c4fe”,
“master_uuid”: “04eaa5173690444ead12b8d5bcde439b”,
“placement_uuid”: “3f22f39c-beb0-4876-a35c-cc1fe0db7ebb”,
“polling_interval”: “5”,
“callhome”: false,
“master_flags”: “flagfile=/etc/conf/yugabyte/master.conf”,
“tserver_flags”: “flagfile=/etc/conf/yugabyte/tserver.conf”,
“join”: “yugabyte05”,
“ysql_enable_auth”: true,
“use_cassandra_authentication”: true,
“cloud_provider”: “cloud1”,
“cloud_region”: “datacenter1”,
“cloud_zone”: “rack1”,
“fault_tolerance”: “none”,
“secure”: true,
“insecure”: false,
“certs_dir”: “/etc/conf/yugabyte/certs”,
“ca_cert_file_path”: “/etc/conf/yugabyte/certs/ca.crt”,
“database_password”: null,
“current_masters”: “yugabyte01:7100,yugabyte02:7100,yugabyte03:7100”,
“ui”: false,
“dns_enabled”: true,
“read_replica”: false,
“cluster_member”: true
}

In v2.20, yugabyed doesn’t have the fix where we update the master/tserver processes with the current set of masters using yb-ts-cli command. Hence, during restart, yugabyte03 node is not able to join as the leader master has changed. In your case leader master has changed from yugabyte05 to yugabyte02. There is a workaround by updating the join address.

In the yugabyte03: yugabyted.conf, can you please update the join flag value from “join”: “yugabyte05” to the current leader master yugabyte02 and restart the node?

Thanks,
Nikhil

Hey @nmalladi,

So, I restarted yugabyte03 and updated “join” to point to leader-master: yugabyte02. The yugabyted.conf file of yugabyte03 got updated with the current leader-master: yugabyte02.

now, on yugabyte03 both master and Tserver failing to start following are the logs

yugabyte03: yb-master.INFO:

Running on machine: yugabyte03
Application fingerprint: version 2.20.5.0 build 72 revision cebde5e50c0865614b4de917dd365e65d272499b build_type RELEASE built at 03 Jul 2024 22:41:00 UTC
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0507 21:48:12.464646 10039 async_initializer.cc:95] Failed to initialize client: Illegal state (yb/client/client-internal.cc:2622): Could not locate the leader master: Unable to determine master addresses
E0507 21:48:12.516268 10049 master.cc:370] Master@10.44.71.238:52012: Unable to init master catalog manager: IO error (yb/util/env_posix.cc:1562): Unable to initialize catalog manager: Failed to initialize sys tables async: Could not load Raft group metadata from /storage/yugabyte_data/yb-data/master/tablet-meta/00000000000000000000000000000000: /storage/yugabyte_data/yb-data/master/tablet-meta/00000000000000000000000000000000: Permission denied (system error 13)
F0507 21:48:12.516348 9990 master_main.cc:145] IO error (yb/util/env_posix.cc:1562): Unable to initialize catalog manager: Failed to initialize sys tables async: Could not load Raft group metadata from /storage/yugabyte_data/yb-data/master/tablet-meta/00000000000000000000000000000000: /storage/yugabyte_data/yb-data/master/tablet-meta/00000000000000000000000000000000: Permission denied (system error 13)
E0507 21:48:12.549072 10047 async_initializer.cc:95] Failed to initialize client: Illegal state (yb/client/client-internal.cc:2622): Could not locate the leader master: Unable to determine master addresses

yugabyte03: yb-tserver.INFO

Running on machine: yugabyte03
Application fingerprint: version 2.20.5.0 build 72 revision cebde5e50c0865614b4de917dd365e65d272499b build_type RELEASE built at 03 Jul 2024 22:41:00 UTC
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
F0507 21:49:13.093456 11043 tablet_server_main_impl.cc:241] IO error (yb/util/env_posix.cc:1562): Could not init Tablet Manager: Failed to open tablet metadata for tablet: b31a247af96544eca27996ccfebde607: Failed to load tablet metadata for tablet id b31a247af96544eca27996ccfebde607: Could not load Raft group metadata from /storage/yugabyte_data/yb-data/tserver/tablet-meta/b31a247af96544eca27996ccfebde607: /storage/yugabyte_data/yb-data/tserver/tablet-meta/b31a247af96544eca27996ccfebde607: Permission denied (system error 13)

can you please help me understand here why it is unable to determine master-leader?

Can you please send the logs from the yugabyte03 node. you can run the following command -

yugabyted collect_logs --base_dir <path-to-base-dir>

Thanks.