Multiregion deployment with single cluster/universe

ranjanp · April 24, 2025, 11:18pm

Hi There,

Here is the requirement for a multi-region deployment. This will provide me balanced data across AZs, scaling up interms of adding more nodes to individual AZs or adding more AZs.

Requirement:

create a tablespace in a 3x3 cluster (3 nodes per region across ms-1, mr-1, st-1, totaling 9 nodes) with 5 replicas with region survival goal:

2 replicas in ms-1 across any 2 AZs of ms-1a, ms-1b, ms-1c.
2 replicas in mr-1 across any 2 AZs of mr-1a, mr-1b, mr-1c.
1 replica in st-1 across any 1 AZ of st-1a, st-1b, st-1c.
Balanced data distribution to avoid zones like ms-1c having minimal data.

I am getting error with zone=‘*’ or zone=‘AZ1,AZ2,AZ3’ while creating tablespace.

My requirement is to features as below and would like Yugabyte to take care of distributing the data/tablets evenly across all AZs in each region. CockroachDB has this feature.

CREATE TABLESPACE geo_partitioned_leader_aware WITH (
replica_placement=‘{
“num_replicas”: 5,
“placement_blocks”: [
{ “cloud”: “aodc”, “region”: “ms-1”,“zone”: “", “min_num_replicas”: 2 },
{ “cloud”: “aodc”, “region”: “mr-1”,“zone”: "”, “min_num_replicas”: 2 },
{ “cloud”: “aodc”, “region”: “st-1”,“zone”: “*”, “min_num_replicas”: 1 }
]
}’
);

Even this doesn’t work looks like the min_num_replicas can’t be 0

ms_testdb=# CREATE TABLESPACE geo_partitioned_leader_aware_3 WITH (
ms_testdb(# replica_placement=‘{
ms_testdb’# “num_replicas”: 5,
ms_testdb’# “placement_blocks”: [
ms_testdb’# {“cloud”: “aws”, “region”: “ms-1”, “zone”: “ms-1a”, “min_num_replicas”: 1},
ms_testdb’# {“cloud”: “aws”, “region”: “ms-1”, “zone”: “ms-1b”, “min_num_replicas”: 0},
ms_testdb’# {“cloud”: “aws”, “region”: “ms-1”, “zone”: “ms-1c”, “min_num_replicas”: 0},
ms_testdb’# {“cloud”: “aws”, “region”: “mr-1”, “zone”: “mr-1a”, “min_num_replicas”: 1},
ms_testdb’# {“cloud”: “aws”, “region”: “mr-1”, “zone”: “mr-1b”, “min_num_replicas”: 0},
ms_testdb’# {“cloud”: “aws”, “region”: “mr-1”, “zone”: “mr-1c”, “min_num_replicas”: 0},
ms_testdb’# {“cloud”: “aws”, “region”: “st-1”, “zone”: “st-1a”, “min_num_replicas”: 1},
ms_testdb’# {“cloud”: “aws”, “region”: “st-1”, “zone”: “st-1b”, “min_num_replicas”: 0},
ms_testdb’# {“cloud”: “aws”, “region”: “st-1”, “zone”: “st-1c”, “min_num_replicas”: 0}
ms_testdb’# ],
ms_testdb’# “num_replicas_per_region”: [
ms_testdb’# {“cloud”: “aws”, “region”: “ms-1”, “num_replicas”: 2},
ms_testdb’# {“cloud”: “aws”, “region”: “mr-1”, “num_replicas”: 2},
ms_testdb’# {“cloud”: “aws”, “region”: “st-1”, “num_replicas”: 1}
ms_testdb’# ]
ms_testdb’# }’
ms_testdb(# );
ERROR: Invalid type/value for some key in placement block. Placement policy: {
“num_replicas”: 5,
“placement_blocks”: [
{“cloud”: “aws”, “region”: “ms-1”, “zone”: “ms-1a”, “min_num_replicas”: 1},
{“cloud”: “aws”, “region”: “ms-1”, “zone”: “ms-1b”, “min_num_replicas”: 0},
{“cloud”: “aws”, “region”: “ms-1”, “zone”: “ms-1c”, “min_num_replicas”: 0},
{“cloud”: “aws”, “region”: “mr-1”, “zone”: “mr-1a”, “min_num_replicas”: 1},
{“cloud”: “aws”, “region”: “mr-1”, “zone”: “mr-1b”, “min_num_replicas”: 0},
{“cloud”: “aws”, “region”: “mr-1”, “zone”: “mr-1c”, “min_num_replicas”: 0},
{“cloud”: “aws”, “region”: “st-1”, “zone”: “st-1a”, “min_num_replicas”: 1},
{“cloud”: “aws”, “region”: “st-1”, “zone”: “st-1b”, “min_num_replicas”: 0},
{“cloud”: “aws”, “region”: “st-1”, “zone”: “st-1c”, “min_num_replicas”: 0}
],
“num_replicas_per_region”: [
{“cloud”: “aws”, “region”: “ms-1”, “num_replicas”: 2},
{“cloud”: “aws”, “region”: “mr-1”, “num_replicas”: 2},
{“cloud”: “aws”, “region”: “st-1”, “num_replicas”: 1}
]
}

dorian_yugabyte · April 25, 2025, 6:36am

Hi @ranjanp

Did you follow any guide on our docs regarding this?

Can you show a full page screenshot of http://yb-master:7000/tablet-servers ?

Also, what version are you using?

Regards,
Dorian

dorian_yugabyte · April 25, 2025, 5:52pm

The issue is num_replicas_per_region is not a supported param in our tablespace declaration. This should be sufficient.

CREATE TABLESPACE multi_region_tablespace WITH (
  replica_placement='{"num_replicas": 5, "placement_blocks":
  [{"cloud":"aws","region":"ms-1","zone":"ms-1a","min_num_replicas":1},
   {"cloud":"aws","region":"ms-1","zone":"ms-1b","min_num_replicas":1},
   {"cloud":"aws","region":"mr-1","zone":"mr-1a","min_num_replicas":1},
   {"cloud":"aws","region":"mr-1","zone":"mr-1b","min_num_replicas":1},
   {"cloud":"aws","region":"st-1","zone":"st-1a","min_num_replicas":1}]}'
);

ranjanp · April 25, 2025, 5:56pm

version 2.25.1.0

ranjanp · April 25, 2025, 6:03pm

Thanks Dorian. I can create with min_num_replicas=1.

As a requirement i described above is it something you plan to add where Yugabyte takes care of distributing the tablets across AZs in a region. 3 AZs in this case and 2 replica copy. This way I can more nodes incase I want to scale up.

Alan_Caldera · April 28, 2025, 12:27pm

Hi @ranjanp -

This improvement is currently being tracked under this GitHub issue: GH# 26671. It is currently undergoing code review and further testing. Once it has been landed, I would expect it to be available in an upcoming drop of 2.25 in the next few weeks.

ranjanp · April 28, 2025, 11:18pm

Thanks a lot, this helps.

ranjanp · April 28, 2025, 11:40pm

And I hope YDB would choose the AZs evenly. As in this case 2 zones out of 3 AZs available and all the 3 nodes will have even data?

Alan_Caldera · April 29, 2025, 4:27am

Hi @ranjanp –

By adding this feature to create tablespace, the idea is for the tablespace to follow the zones where there are nodes in the cluster. The cloud=aws, region=ms-1, zone=* designation says that I want to match on any zone that has nodes present in the cluster.
The choice is actually not ours to make on the placement if you are the one designating where the nodes are logically or physically. If there is no node present in a given AZ, we cannot place any data there in full or in part.

Hope this clears things up.
–Alan

ranjanp · April 30, 2025, 11:52pm

I think I understand your point. But my question is little different.

if I create a table(geo-partitioned) on below tablespace. As you mentioned above for region “us-east-1” you would choose 2 out of 3 AZs available(us-east-1a, us-east-1b and us-east-1c). This part you clarified.

Since you are choosing 2 out 3 AZs randomly there is a possibility of one or more nodes being overused interms of tablets. My question is, would you plan to distribute the tablets evenly across 3 AZs or its completely random.

CREATE TABLESPACE geo_partitioned_auto_zone WITH (
replica_placement=‘{
“num_replicas”: 5,
“placement_blocks”: [
{ “cloud”: “aws”, “region”: “us-east-1”, “zone”: “", “min_num_replicas”: 2 },
{ “cloud”: “aws”, “region”: “eu-west-1”, “zone”: "”, “min_num_replicas”: 2 },
{ “cloud”: “aws”, “region”: “ap-southeast-2”, “zone”: “*”, “min_num_replicas”: 1 }
]
}’
);

Alan_Caldera · May 2, 2025, 12:16pm

Hi @ranjanp –

Each node has its own storage for data (whether that be local NVME/SSD or EBS-style storage). A node can only exist in 1 zone. Since you have called for 2 “copies” to reside in us-east-1, you will only have 2 zones. The cluster load balancer will balance the number of leaders across the nodes in the zone, so there is not much possibility of a node being overused in terms of tablets from that perspective. Suboptimal data modeling choices may result in some tablets getting too much activity (hot shards), but there are methods to identify that condition when it occurs.

Hope this helps.
–Alan

sanketh · May 5, 2025, 10:46pm

@ranjanp : There is one caveat with using wildcard regions/zone like

CREATE TABLESPACE wildcard_zone_rf5 WITH (
replica_placement=‘{
“num_replicas”: 5,
“placement_blocks”: [
{ “cloud”: “cloud1”, “region”: “r1”,“zone”: “*", “min_num_replicas”: 2 },
{ “cloud”: “cloud1”, “region”: “r2”,“zone”: "*”, “min_num_replicas”: 2 },
{ “cloud”: “cloud1”, “region”: “r3”,“zone”: “*”, “min_num_replicas”: 1 }
]
}’
);

when we have servers in

r1 - z1, z2, z3
r2 - z1, z2, z3
r3 - z1, z2, z3

If zone r1.z1 fails, the placement allows all these tablets to be automatically moved to the remaining two AZs in r1 - r1.z2 and r1.z3. These two AZs will now see ~33% more load in terms of both the disk usage and cpu workload of these tablets and should be “over-provisioned” to handle this outage event. Note that if we had pinned the tablet copies to specific AZs like

r1.z1 : min_num_replicas=1,
r1.z2 : min_num_replicas=1,
r2.z1 : min_num_replicas=1,
r2.z2 : min_num_replicas=1,
r3.z1 : min_num_replicas=1

When r1.z1 fails, there should be no data movement to other AZs, though data is now under replicated at 4 copies out of 5. In this case, the operator will manually intervene to bring up r1.z3, change placement and cause tablets to be restored to RF5. Prior to this manual restoration to 5 copies, there is no additional disk usage on the remaining AZs - though there will be additional cpu workload depending on the exact leader preference used.

Topic		Replies	Views
Yugabyte multi site cluster General	8	2045	May 18, 2018
What multi-region deployment options does YugaByte DB support? General	1	855	January 9, 2020
Can't write data if single AZ fails in multi AZ deployment General	1	587	January 28, 2021
Multi-Zone and Multi-Cluster YB deployment in Kubernetes Design Discussions	0	1838	April 8, 2019
2-region YugabyteDB cluster General	3	686	June 15, 2021

Multiregion deployment with single cluster/universe

Related topics