Master + tserver placement when geo-partitioning?

When I do geo-partitioning, where do the masters go, all together in one region or spread out? What if I have more regions than my master count / replication factor?

How does master placement tie in to data residency? I.e., do the masters hold data like primary keys that might be important to care about for legal purposes?

I assume each region needs a number of tservers >= my replication factor?

Everything is possible but you need to think about availability and latency.

Having all masters in one region is not a good idea as one region failure can stop the database. So, better 3 regions. Now about latency, better to have them close to the users because the catalog is read from the master at connection time and maybe later.

So, it can be a good idea to choose 3 regions that are not far, like having 10 millisecond round trip latency. And near the most important users.

If you have more regions, you will have regions with no master. No problem, you don’t need more than the replication factor for masters.

The master doesn’t hold data. Only metadata about the cluster, and the postgresql catalog. However, some metadata may hold part of data. For example, in the PostgreSQL catalog, there are statistics used by the query planner which includes popular values for columns. Probably not exposing something sensitive as it is just a value. In a similar way, the tablet bounds for range sharded indexes will have the values at the bounds. But, again, probably ok as they are isolated values, not full rows.

I assume each region needs a number of tservers >= my replication factor?

No, if you have regions >= replication factor, the distribution of leaders and follower will be across regions to be resilient to region failure.

But your question brings another possibility, if that’s what you want. You can have one cluster in each region, and use xCluster replication xCluster replication | YugabyteDB Docs
But that would be complex to manage to be sure to connect at the right place. xCluster is asynchronous replication

For my use case I want to have most data in the US, but EU-specific data in Europe, with people in each location having low latency access to their own data.

It sounds like I’d want masters in, for example, NYC/Virgina/Frankfurt to keep catalog reads quick on both sides of the Atlantic. But US <-> Europe isn’t sub-10ms latency. How will that affect my cluster’s performance?

I’m not sure I follow. If I have RF=3 and have geo-partitioned EU data to stay in EU, then doesn’t the EU need 3 tservers to hold the 3 copies of data that can’t be replicated to other locations?

Yes you are right, for this you need more than 3 servers in EU. But you can, if you want to stay available on region failure, have 3 regions in EU and define the placement of EU data with partitioning and tablespaces

About masters, only the leader matters. Followers are there just to become leader when the leader is down. One follower in EU will not help anything for EU users. Except if one master is down in the US and the new leader is the EU one. Then metadata lookups will be faster from EU than from US.

The new connections, and DDL, will always read from the master leader. However this is not a problem, just something that requires to be more careful about using connection pools and prepared statements, and limiting DDL. The latency to the master (ie to the master leader) should not impact the scalability of DML.

Note that this may change in the future if there is a need to:
[docdb] Support to read from master follower · Issue #2199 · yugabyte/yugabyte-db · GitHub

Ah, so the only time the latency to the master leader comes into play is on initial connection, and when making prepared statements? And after that, queries are low-latency if they only touch data stored on nearby tservers? Does that require a topology-aware DB driver?

Yes, exactly. And even preparing statements should not suffer as they keep the important information in cache, but this can be invalidated by DDL. In short, if you experience performance issues when you are far from the master, open a git issue, we have something to fix :wink:

The read path and write path docs seem to indicate something different: the tserver talks to the master leader to look up which tablet to route the read/write to. This is indicated to be based on the query parameters, so a prepared statement wouldn’t dodge the latency.

Am I missing some nuance here?

Was me missing some nuances. The t-server caches the tablet location information but the first access from a t-server will need to get it from master (first from node, not for each connection)
Note that when I’m saying the latency to master should not impact prepared statements read writes, that’s the theory, when all is cached, no DDL, generic plans used, no tablet movement or split… Better test it with your cloud regions and your application.

Cool, I’ll run some tests.

Thanks for answering my questions!

1 Like