List_all_masters are so slow, across k8s cluster, joined by loadbalancer ip

yulintan · November 15, 2025, 2:14am

Can anyone help me troubleshoot this issue?

I have 3 Kubernetes clusters, and in each one I deployed YugabyteDB with 1 master and 1 tserver, then connected all three into a single universe.

Each master is advertised through a Kubernetes Service LoadBalancer IP.

However, I am seeing extremely slow master RPC calls. For example, this command takes a long time to return:

yb-admin --master_addresses 10.96.196.75:7100,10.96.226.47:7100,10.91.193.215:7100 list_all_masters

Output:

Master UUID                             RPC Host/Port       State     Role      Broadcast Host/Port
91e8753df07b43bd8e0f5130e97ee56c        10.42.26.21:7100    ALIVE     FOLLOWER  10.96.196.75:7100
1ed16710628543d7b351fd4a0c2e3018        10.250.1.219:7100   ALIVE     FOLLOWER  10.96.226.47:7100
d08514bcae854809aeb9b5e9487e897d        10.250.1.135:7100   ALIVE     LEADER    10.91.193.215:7100

Additional details

Broadcast Host/Port values are routable private IPs (we are running on a private cloud).
RPC Host/Port values are pod IPs, which cannot be reached from other clusters. They may not need to accessed from external k8s cluster?

As a result:

list_all_masters and other yb-admin commands are extremely slow
psql connections disconnect constantly
General cross-cluster communication is unstable

One of my master configurations looks like this:

--fs_data_dirs=/mnt/disk0
--master_addresses=10.96.196.75:7100,10.96.226.47:7100,10.91.193.215:7100
--replication_factor=3
--enable_ysql=true
--master_enable_metrics_snapshotter=true
--metrics_snapshotter_tserver_metrics_whitelist=handler_latency_yb_tserver_TabletServerService_Read_count,handler_latency_yb_tserver_TabletServerService_Write_count,handler_latency_yb_tserver_TabletServerService_Read_sum,handler_latency_yb_tserver_TabletServerService_Write_sum,disk_usage,cpu_usage,node_up
--metric_node_name=${EXPORTED_INSTANCE}
--memory_limit_hard_bytes=1824522240
--stderrthreshold=0
--num_cpus=2
--max_log_size=256
--undefok=num_cpus,enable_ysql
--use_node_hostname_for_local_tserver=true
--rpc_bind_addresses=${HOSTNAME}.yb-masters.${NAMESPACE}.svc.cluster.local
--server_broadcast_addresses=${HOSTNAME}.yb-masters.${NAMESPACE}.svc.cluster.local:7100
--webserver_interface=0.0.0.0
--default_memory_limit_to_ram_ratio=0.85
--leader_failure_max_missed_heartbeat_periods=10
--max_clock_skew_usec=10000000
--placement_cloud=rancher
--placement_region=ca-west-1
--placement_zone=A
--rpc_bind_addresses=${POD_IP}
--server_broadcast_addresses=10.96.196.75:7100
--use_node_hostname_for_local_tserver=false
--use_private_ip=never

I suspect the combination of:

pod IPs being used as RPC Host/Port (not reachable across clusters),
LoadBalancer IPs being used for broadcast,
and --use_private_ip=never

is causing the extremely slow RPC behavior.

Any suggestions on how to correctly configure cross-cluster masters/tservers or how to fix the RPC routing would be greatly appreciated.

dorian_yugabyte · November 17, 2025, 6:13am

Hi @yulintan

It’s probably these. Please configure the dbs to be able to connect directly.

What logs are you getting on the yb-masters & yb-tservers?

yulintan · November 17, 2025, 7:46pm

Thanks,
I also doubt it’s caused by rpc IPs(pod IPs) can not access form each other.

There are no logs.

yulintan · November 17, 2025, 9:42pm

Unfortunately, our private cloud does not support exposing pod IPs across clusters. Is there any way to work around this limitation? I can expose each pod using a LoadBalancer IP, but the issue is that RPC cannot bind to a LoadBalancer IP because it’s not a real network interface.

Topic		Replies	Views
Accessing YugaByteDB on k8s over 9042 externally General	5	534	November 10, 2023
Unable to determine master addresses General	5	610	February 5, 2025
How can we setup 3 node master master cluster in one data center? General	4	1036	August 12, 2021
YugabyteDB cluster on different physical servers with Docker Compose General	9	223	February 2, 2026
Deployment for a single region with 3 nodes on 3 VMs General	2	557	March 20, 2023

List_all_masters are so slow, across k8s cluster, joined by loadbalancer ip

Additional details

One of my master configurations looks like this:

Related topics