Yb master cluster balancer

Manish · April 9, 2026, 5:57am

We recently had an issue on our yugabyte cluster, where we saw something similar to an election storm.
We saw that there were too many tablet elections happening, and also saw that yb-master’s cluster balancer was also triggering a lot of leader rebalancing, (which was also adding to more elections, therefore making the situation worse).
We were able to stablise the cluster by identifying a tserver with large number of threads as compared to other tservers and stopping that tserver. [We don’t know what started the election storm]

As part of Action Items to mitigate this from happening again we were wondering if changing the value for leader_balance_threshold from default 0 to a higher value would help.

yugabyte db version: 2024.2.0.0

we tried this config change on our test environment.
before: [User Tablet-Peers / Leaders]
- tserver1: 445 / 148
- tserver2: 445 / 148
- tserver3: 445 / 149

Updated the leader_balance_threshold to 3 on all yb-master vms
blacklisted tserver1 using change_leader_blacklist command
- this made tserver1: 445 / 0, and increased leaders on other 2 tservers
removed blacklist for tserver1

expectation was:
- tserver1: 445 / 147
- tserver2: 445 / 149
- tserver3: 445 / 149

what I got:
- tserver1: 445 / 40
- tserver2: 445 / 202
- tserver3: 445 / 203

redid the whole thing with leader_balance_threshold to 2, same result
redid the whole thing with leader_balance_threshold to 1, result was
- tserver1: 445 / 90
- tserver2: 445 / 177
- tserver3: 445 / 178

I am wondering if leader balancer works on each table level, balancing and honouring the threashold set for tablets of each table. instead of on the cluster level. documentation does not say anything about this.

Anubhav_Srivastava · April 13, 2026, 3:47pm

Hi Manish,

Yes, leader_balance_threshold works at the table level. This is a gap in our documentation, as you pointed out. I have opened a PR to fix it.

–

Anubhav Srivastava (YB)

Manish · April 13, 2026, 7:03pm

As part of Action Items to mitigate this from happening again we were wondering if changing the value for leader_balance_threshold from default 0 to a higher value would help.

@Anubhav_Srivastava can you clarify on this, my understanding is that this would help during an election storm. If we increase the value high enough, this would lead to reduction in the yb_master triggered cluster balancing, leading to less number of elections. Am i right on this?

Anubhav_Srivastava · April 13, 2026, 8:56pm

Increasing the leader_balance_threshold flag might help here, but it wouldn’t fix the underlying cause of the election storm. If you can figure out what is causing that (e.g., a certain query causing high load on a tserver, an internal deadlock, etc.) that would be useful.

If you’re looking for more of a patch fix, a more typical approach is to lower the value of load_balancer_max_concurrent_moves to something like 5, so that the cluster balance makes fewer concurrent leader moves in each iteration (the cluster balancer runs around once a second, so that is ~5 leader moves / s). Note that this would increase the amount of time for operations like leader blacklisting and moving leaders back to failed nodes.

Topic		Replies	Views
Failed to trigger leader election: Illegal state General	24	10390	January 25, 2024
Tablets are not splitted evenly General	35	8857	April 19, 2024
Newly added Tserver not have any load General	8	8411	May 3, 2024
Leaderless Tablets and cluster load is not balanced General	2	3235	April 8, 2024
When leader is shut down, a new leader is not elected General	5	7361	September 20, 2021

Yb master cluster balancer

Related topics