Can't write data if single AZ fails in multi AZ deployment

[Question posted by a user on YugabyteDB Community Slack ]

I have following running in Multi-DC test setup in our own DC’s

i2k2-cloud [CLOUD]
	i2k2-india [REGION]
		i2k2-noida-az [AZ]
			master-01
			tablet-1
			tablet-2
		i2k2-mumbai-az [AZ]
			master-02
			tablet-3
			tablet-4
		i2k2-banglore-az [AZ]
			master-03
			tablet-4
			tablet-6

In above config I can fail “1 master” and “1 tablet” in overall cluster anywhere and it works.
But any “2 tablets” failure anywhere in cluster stops all writes and but reads still work.
What is needed in above setup/config to survive the full AZ failure?

Looking at cluster config:

It seems you haven’t run “modify-placement-info” command in yb-admin cli.

After running:

./bin/yb-admin --master_addresses ip1:7100,ip2:7100,ip3:7100
 modify_placement_info i2k2.i2k2-ind.i2k2-nod-az,i2k2.i2k2-ind.i2k2-mum-az,i2k2.i2k2-ind.i2k2-blr-az 3

The cluster config changed to:

version: 1
replication_info {
  live_replicas {
    num_replicas: 3
    placement_blocks {
      cloud_info {
        placement_cloud: "i2k2"
        placement_region: "i2k2-ind"
        placement_zone: "i2k2-nod-az"
      }
      min_num_replicas: 1
    }
    placement_blocks {
      cloud_info {
        placement_cloud: "i2k2"
        placement_region: "i2k2-ind"
        placement_zone: "i2k2-mum-az"
      }
      min_num_replicas: 1
    }
    placement_blocks {
      cloud_info {
        placement_cloud: "i2k2"
        placement_region: "i2k2-ind"
        placement_zone: "i2k2-blr-az"
      }
      min_num_replicas: 1
    }
  }
}
cluster_uuid: "107f2d2c-5174-4385-bb2e-9aa9309aa3d7"

And now the cluster can accept writes+reads on a full AZ failure.

1 Like