How to remove a node from Multi region cluster gracefully

Hello,
I cannot remove this replica node. I had already stop yugabyte on this (.71 ip) replica node and remove the directories, when trying to add the node back in I get the error the node already exist and running below command below in image does not remove it still.

Trying to add node in again with Yugabyted command
$ /opt/yugabyte/bin/yugabyted start --secure --certs_dir=/db/yugabyte/certs --cloud_location=ib.us-west-2.us-west_rr-1a --base_dir=/db/yugabyte --advertise_address=xx.xx.2.71 --join=xx.xx.202.95 --read_replica
Fetching configs from join IP…
ERROR: A node is already running on xx.xx.202.95, please specify a valid address.

Hi @tryyuga

Can you explain what you’re trying to do by removing & readding the same node with the same ip address?

Yes, @dorian_yugabyte I was testing this, I added something and got it messed up so I remove it and want to remove it completely from the cluster so I can re-add it to the cluster with same ip. I notice doc states only new server (assuming new ip) can this be done?

@tryyuga : It seems like you are hitting a safety check in yugabyted. To work around this, I would suggest the following steps

  1. Stop the tserver running on IP 2.71, verify using ps -ef | egrep yb-tserver
  2. Wait for at least 1 minute so that this tserver is marked DEAD in the master leader UI on port 7000 on the “Tablet Servers” page.
  3. Stop and restart the leader master node so that leadership transitions to a different master. You can locate the leader master through the :7000 master UI. You can either kill the yb-master process on the leader master node and it should restart automatically OR you can use this yb-admin command
  4. Confirm that the new master leader UI does not show the tserver .2.71 in the “Tablet Servers” page.

At this point, you should be able to run the yugabyted command above to add the node back. If not, please share the output from yugabyted.

@sanketh I just tried that, new leader went to xx.xx.202.96, but fail at step #4, still shows as DEAD. can’t seem to get rid of it.

@tryyuga : Sorry that didn’t work. Can you set the flag --hide_dead_node_threshold_mins=1 on the leader master? You can set this via yb-ts-cli set_flag on just the new master leader (202.96) and refresh the tablet servers page. Given you have certs enabled, note that you will need to run yb-ts-cli with the certs-dir-name option pointing to the certs (the value is the same as --certs_dir on the master/tserver process. Otherwise, you can also restart all masters with this flag specified to yugabyted as usual.

I have already set that flag, set it as “master_flags”: "hide_dead_node_threshold_mins=30 in the config. I think it’s still not removing it from cluster because I still cannot add it in, still see the replica, so what is the best way to remove it? the cmd window I had attached earlier was what I ran on the master but still does not remove the node cleanly.
us-west_rr-1a is the replica still shows up on get_universe_config
/opt/yugabyte/bin/yb-admin --certs_dir_name /db/yugabyte/certs --master_addresses $MASTERS get_universe_config
{“version”:31,“replicationInfo”:{“liveReplicas”:{“numReplicas”:3,“placementBlocks”:[{“cloudInfo”:{“placementCloud”:“ib”,“placementRegion”:“us-west-1”,“placementZone”:“us-west-1c”},“minNumReplicas”:1},{“cloudInfo”:{“placementCloud”:“ib”,“placementRegion”:“us-west-1”,“placementZone”:“us-west-1b”},“minNumReplicas”:1},{“cloudInfo”:{“placementCloud”:“ib”,“placementRegion”:“us-west-1”,“placementZone”:“us-west-1a”},“minNumReplicas”:1}],“placementUuid”:“NzIzZDE4ZTAtNTIwNy00NDk4LWFkYTUtYTJjYTM3YTI0Yzg4”},“readReplicas”:[{“numReplicas”:1,“placementBlocks”:[{“cloudInfo”:{“placementCloud”:“ib”,“placementRegion”:“us-west-2”,“placementZone”:“us-west_rr-1a”},“minNumReplicas”:1}],“placementUuid”:“NzY5YjE5YjItMmQ4OC00ODI5LWE5MGItM2ZlMzUxMGFlZTJi”}],“multiAffinitizedLeaders”:[{“zones”:[{“placementCloud”:“ib”,“placementRegion”:“us-west-1”,“placementZone”:“us-west-1a”},{“placementCloud”:“ib”,“placementRegion”:“us-west-1”,“placementZone”:“us-west-1b”},{“placementCloud”:“ib”,“placementRegion”:“us-west-1”,“placementZone”:“us-west-1c”}]}]},“serverBlacklist”:{“initialReplicaLoad”:0},“clusterUuid”:“68893765-9c10-47e7-b3ad-e0a4fa7ff8ee”,“universeUuid”:“08abacc6-5882-4929-bc45-803a91d6dc85”}

@tryyuga : We identified an issue with the yugabyed check at [yugabyted] Issue removing and adding a dead tserver back to the cluster · Issue #27675 · yugabyte/yugabyte-db · GitHub and will improve this check. Thanks for finding this problem in recent builds. For now, the workaround would be to comment out this check in yugabyted manually.

Can you please modify this line in Python yugabyte-db/bin/yugabyted at cdb79c4ecc36de79e9184e512221bcae5f4739dd · yugabyte/yugabyte-db · GitHub to ignore this error?

Replace the lines

                                Output.log_error_and_exit(Output.make_red("ERROR:") + " A node" +
                                    " is already running on {}, please ".format(join_ip) +
                                    "specify a valid address.")

with

                     Output.print_and_log("Ignoring existing tserver")

Just replacing this log_error_and_exit with print_and_log will allow the code to make progress. Let us know if you still run into any issues.

Thank you for the response though I would not a clue on how to modify it. Would this be fix in the official release for 2.25.0.0?