Timeouts on 16 node cluster due to bad schema design

sham_yuga · January 22, 2022, 1:14am

16 nodes in a region

Application in central region is configured to talk to master(leader) in central region

Application is using YCL driver(gocql), so the table is currently hash based partitioned

Given the requirement with huge traffic of SELECT query(below) and less traffic of INSERT & UPDATE queries:

SELECT userID, col3, col4, col5, col6, col7
                  FROM table1
              WHERE domainID = :domain_id AND userID IN :user_id_set

Database currently has schema(below) with hash based partition:

CREATE TABLE keySpaceName.table1 (         							
domainID VARCHAR, 
userID VARCHAR, 
col3 VARCHAR, 
col4 VARCHAR, 
col5 VARCHAR, 
col6 VARCHAR, 
col7 VARCHAR, 
           PRIMARY KEY((userID, domainID), col3, col4));

Note: One domain can have multiple users. Thousands of users in a domain.

For some historical reasons, our schema was badly partitioned using PRIMARY KEY((userID, domainID), col3, col4)), which lead to SELECT queries getting huge timeouts with 16 node cluster(hosting hash based partitioned table1). Need to regularly restart application.

So, refactoring PRIMARY KEY((userID, domainID), col3, col4)) to PRIMARY KEY((domainID), userID) WITH CLUSTERING ORDER BY (userID ASC); . Table gets created with hash partitioning using key domainID. Clustering key userID would be ordered ascending.

Partition key domainID would enhance performance of search domainID = :domain_id

and

clustering key userID would enhance performance of search userID IN :user_id_set

Please let me know, if this refactoring make sense, to enhance WHERE clause performance in SELECT query
How do INSERT query(to add a new row) & UPDATE query(to update an existing row) perform on this refactoring?

dorian_yugabyte · January 22, 2022, 8:31am

Hi @sham_yuga

This is a single regino/az deployment, correct?

How big (CPU,memory,SSD) is each of these nodes?

Can you be more specific what you mean here? Do you mean you connect to only 1 node?

How big is the :user_id_set on each query? How many values?

Can there be domains that have a huge number of users? So much that they can outgrow a single node? You can think of this in how many rows, how many GBs, and how much select/inserts/second?

The INSERT would work the same. What’s important is my questions above to see the effects of everything happening.

sham_yuga · January 22, 2022, 9:47am

@dorian_yugabyte

2 Regions, 3 AZs, 24 nodes.
16 nodes in Central region, 8 nodes in east region
Application can talk to any of the 16 nodes in central region, to submit SELECT/INSERT/UPDATE queries
Users cannot outgrow a single node.

dorian_yugabyte · January 22, 2022, 4:32pm

Please answer all my questions.

I was talking about “domain”. How many users for a domain and what would be the biggest domain?

sham_yuga · January 22, 2022, 5:09pm

@dorian_yugabyte
As mentioned here,
Table partitioning into tablets is done by Yugabyte. Assigning each tablet to a node is not in our control. If there are n tablets(say medium size each) assigned to a specific node(nodeX). A tablet out of those n tablets can outgrow a node, because nodeX has multiple tablets assigned.

In any case, there is very good chance of domain to outgrow a node, in our case

sham_yuga · January 24, 2022, 1:51am

@dorian_yugabyte
Did I answer all the questions?

For your question, what is the biggest domain?

Knowing the size of biggest domain, does it help us ensure that node of that capacity (GB) will own that one tablet? Is this in our control? Because there can be more than one tablet owned by one node. Correct me.

dorian_yugabyte · January 24, 2022, 8:28am

@sham_yuga please, answer ALL my questions, just go 1 by 1. If you don’t know the answer, answer with a ballpark estimate or say you don’t know.

This answer doesn’t help. Let me explain. You can answer I really don't know how big they can get, maybe hundreds GBs.

But if you answer They can grow to 50GB and My nodes have 8GB memory, then we can keep a domain to 1 node and you can grow nodes up to 128GB+ memory and it will work.

sham_yuga · January 24, 2022, 3:45pm

@dorian_yugabyte

Here are the answers for your questions:

This is a single region/AZ deployment, correct?

No.
2 Regions, 3 AZs,
24 nodes in total
16 nodes in Central region, 8 nodes in east region
How big (CPU,memory,SSD) is each of these nodes?

16 cores (soon to be 32 cores)
Ram 64GB
Disk 1TB
Can you be more specific what you mean here? Do you mean you connect to only 1 node? My bad.

I gave wrong info on this, in previous comments. Out of 16 nodes in central region, Currently GoCQL driver is configured with hosts info for 16 tablet servers(from central region). Below is the driver code that take those 16 hosts info

        func NewCluster(hosts ...string) *ClusterConfig {
               cfg := &ClusterConfig{
                     Hosts:                  hosts,

How big is the :user_id_set on each query? How many values?

On average, we see set size as 10 user id’s per select query
Can there be domains that have a huge number of users? So much that they can outgrow a single node? You can think of this in how many rows, how many GBs…

~3000 domains, this number is not expected to grow , maybe 10 max per year
750,000 users per domain(that has huge number of users)
3 items at 1 store = 1KB
3,000 items at 1 store = 1MB
3,000,000 items at 1 store = 1GB
No, each node will have 1TB disk and it’s not expected to outgrow 1 node
How many SELECT queries per second? INSERT queries per sec? UPDATE queries per sec?

10k SELECT ops per second
~2-5k UPDATEs every hour or so
INSERTs are rare

dorian_yugabyte · January 25, 2022, 10:52am

@sham_yuga I don’t understand how “domain,user” are connected to “items in a store” ?
I “store” a “domain” ? And “user in domain” is “item in store”?

Have you tried making single-row queries? So domain=x and user=y, and making 10 queries in parallel (ex: 10 goroutines)?

Make sure to also configure with cluster.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("dc1")) where “dc1” is the name of --placement-cloud of your central region.

And make sure you’re using prepared queries.

Can you try these changes?

sham_yuga · January 25, 2022, 11:11am

@dorian_yugabyte

3 users in 1 domain = 1KB
3,000 users in 1 domain = 1MB
3,000,000 users in 1 domain = 1GB

Before we try single row query,
How to see partition key, given this scenario?

dorian_yugabyte · January 25, 2022, 12:11pm

Since you expect at most 750K users in a domain, then it means less than 1GB.
This means it won’t be a problem for memory size, the whole domain would stay in memory.
The remaining question would be the peak QPS for a given domain? Since 10K QPS is assuming for the whole cluster, correct?

Both should work. I suggest trying my suggestion first before changing the schema since it’s easier.

sham_yuga · January 26, 2022, 2:20pm

@dorian_yugabyte
Sure, will try your suggestion and let know

Topic		Replies	Views
Would it be timeout for range query on the second cluster key when data increase very large General	8	890	March 19, 2020
Timeout executing DDL General	3	1161	December 15, 2021
Create geo partition table error on multi-region setup General	3	21	October 25, 2024
Function partition_hash not working as expected General	23	1126	January 20, 2023
Timeout compacting table General	4	910	January 6, 2022

Timeouts on 16 node cluster due to bad schema design

Related topics