Multi-Zone and Multi-Cluster YB deployment in Kubernetes

Goal

The goal is to enable customizable multi-zone and multi-region deployment in Kubernetes. It should be possible to:

  • Specify the deployment zones
  • Configure number of tservers to be brought up in each of the selected zones with the required replication factor.
  • Change the deployment configuration on a running cluster, for example: changing zones, the number of pods, the resources used by each pod.
  • Perform rolling upgrades on the YugaByte DB cluster, for example: software upgrades and changing the database properties (gflags).

Supported Scenarios

It is assumed that the underlying Kubernetes cluster is setup in one of the following configurations.

One k8s cluster, spanning multiple zones or regions

In this case, since there is exactly one k8s cluster, there is only one kubeconfig. Labels will be used to control placement of pods and replicas of the data across the various fault domains (zones or regions) when deploying YugaByte DB.

This deployment paradigm is expected to be commonly used for multi-zone deployments in managed k8s offered by public clouds like GKE.

Multiple k8s clusters, one per zone or region

In this case, each fault domain (zone or region) has a separate k8s cluster - each of which have their own kubeconfig. The YugaByte DB cluster is broken up into three subsets, each subset is setup using a helm chart invocation using a corresponding kubeconfig.

In this scenario, it is assumed that the clusters have connectivity amongst themselves, that is, the pods in one cluster can resolve the FQDNs of the pods in the other cluster(s).

This deployment paradigm is expected to be commonly used across region in public clouds, and for multi-DC deployments in the case of private data centers.

Design

The current Kubernetes design of using statefulsets will be extended to enable the ability to deploy YugaByte DB clusters in different zones. The deployment of the YugaByte DB clusters is done through helm charts. Using multiple helm deployments, one for each zone, will enable the user to have complete control over the number of tserver pods that they would like in each zone. The number of master pods in each zone can also be controlled by the user. Each deployment will be independent of the others, other than the fact that each deployment will have to pass on the correct master addresses to ensure proper functioning of the whole YugaByte DB universe.

The helm deployments as well as computing the number of masters and the corresponding addresses can be done using an external script for automation. Then that script can also ensure proper handling of reconfiguring the YugaByte DB universe by making the necessary changes to the helm chart for each deployment as well as creating or deleting other helm deployments in case new zones and regions need to be added or removed respectively.

The multi-region/multi-cluster aspect of all the deployments will be taken care of by using the kubeconfig for the required region (k8s cluster) when deploying the helm chart for a zone pertaining to that region(cluster).

The helm deployments will select the requested zone using the NodeAffinity parameter of the helm chart (specifying the label failure-domain.beta.kubernetes.io/zone and setting the value to the desired zone of deployment). The user needs to create storage classes corresponding to each zone for the proper deployment on the universe. This is because the helm charts use PVCs for creating the PVs for the pods. Due to the nature of kubernetes deployments, a PV is created first and then a pod is brought up in the same zone as the created PV. Since the storage class from which the claim is requested can bring up the PV in any zone at random, if the storage class isn’t constrained to a particular zone, a deployment with node affinity to the requested zone would fail.

2 Likes