Adding yb-master fails

Hi,

We are currently trying to migrate an existing Yugabyte cluster to a new one. Our instance is deployed on a Kubernetes cluster using the Yugabyte Helm chart (version 2024.2.5.1-b1).
It is a multi-zone setup, as described here: Deploy multi zone on EKS using Helm Chart | YugabyteDB Docs
To replace the current set of yb-masters, we deployed a new set of masters in the respective zones.
We are following the instructions as described here: Replace a failed YB-Master | YugabyteDB Docs
The new set of masters are started with an empty list of master_addresses.
When we try to add the new master of zone-1, the yb-admin command fails:

export MASTER_PORT=7100
export OLD_MASTER_1=zone-1-yugabyte-yb-master-0.zone-1-yugabyte-yb-masters.zone-1.svc.cluster.local
export OLD_MASTER_2=zone-2-yugabyte-yb-master-0.zone-2-yugabyte-yb-masters.zone-2.svc.cluster.local
export OLD_MASTER_3=zone-3-yugabyte-yb-master-0.zone-3-yugabyte-yb-masters.zone-3.svc.cluster.local
export OLD_MASTERS=$OLD_MASTER_1:$MASTER_PORT,$OLD_MASTER_2:$MASTER_PORT,$OLD_MASTER_3:$MASTER_PORT

export NEW_MASTER_1=new-zone-1-yugabyte-yb-master-0.new-zone-1-yugabyte-yb-masters.zone-1.svc.cluster.local
export NEW_MASTER_2=new-zone-2-yugabyte-yb-master-0.new-zone-2-yugabyte-yb-masters.zone-2.svc.cluster.local
export NEW_MASTER_3=new-zone-3-yugabyte-yb-master-0.new-zone-3-yugabyte-yb-masters.zone-3.svc.cluster.local
export NEW_MASTERS=$NEW_MASTER_1:$MASTER_PORT,$NEW_MASTER_2:$MASTER_PORT,$NEW_MASTER_3:$MASTER_PORT

export ALL_MASTERS=$OLD_MASTERS,$NEW_MASTERS


yb-admin -master_addresses $OLD_MASTERS change_master_config ADD_SERVER $NEW_MASTER_1 $MASTER_PORT
Error running change_master_config: Service unavailable (yb/master/scoped_leader_shared_lock.cc:91): Unable to change master config: Catalog manager is not initialized. State: 1

It seems the cause is that the new master is not yet ready to be added to the existing cluster.
Removing and re-adding an existing master works without any problems.

We also tested other versions of Yugabyte.
• Version 2.20.12.0-b30 works without any issues.

• Version 2024.1.1.0-b137 partially works. It is possible to add the new master to the existing cluster. It also shows up in the master web UI and when listing all masters with the yb-admin CLI. However, the new master logs the following error:

I0115 13:41:05.609918  2067 client-internal.cc:2792] New master addresses: []
E0115 13:41:05.610303    50 async_client_initializer.cc:94] Failed to initialize client: Illegal state (yb/client/client-internal.cc:2795): Could not locate the leader master: Unable to determine master addresses

I have prepared Kubernetes manifests for the versions 2024.2, 2024.1 and 2.20.12. You can find them here: GitHub - MichiganL/yugabyte-add-master-issue
The logs of the masters can be also found there.

Could you please help us find a way to add the new master to the existing cluster?

@MichiganL : On the new masters, can you check if the log Starting master in shell mode appears? This should happen if the data dir (pointed to by --fs_data_dirs is empty or if master_join_existing_universe is set to true). This confirms that the new master is ready to be added. If you don’t see, try cleaning the data dir.

If the new master is in shell mode correctly, can you then share the logs from the leader master after running the change_master_config ADD_SERVER? It should report some RAFT status changes, we can try to identify where it could be stuck.

The logs at yugabyte-add-master-issue/yugabyte-2024-2/logs/master-zone-3.logs at main · MichiganL/yugabyte-add-master-issue · GitHub only contain stdout from the pods. Can you include the yb-master*INFO* files that appear in the data directory logs folder?