Poor performance when huge number of tablets

We have cluster consisting of 40 nodes (3 master nodes, 40 t-server nodes). Each machine is 64 CPU, 256 Gb RAM, 6 nvme disks (6.4 Tb each).

1 database with 2,000 non-colocated tables (50gb - 30tb). Some tables are partitioned by 100+ partitions. The number of tablets on each server is about 10,000+. The yugabyte’s config is default.

When executing “select one row by index”-like queries we’ve faced abnormally high latency for the majority of queries (~13-15 secs and even more). Also we found out that there is high cpu usage (15-20% on each node) without any workload. There are no such problems when performing operations on an empty cluster.

Could you give us some advice or best practices for this particular case? We tried to modify config according to “best practices” in Yugabyte Docs, but it did not help.

Hi @Meteo

You’re most likely hitting tablet replica overhead.


First step is to lower the number of shards. Either you’re over sharding (too many tablets) or over-partitioning (too many tables each having too many tablets).

  1. Have you seen Best practices for YSQL applications | YugabyteDB Docs ?
  2. What’s the exact number of tablets just for reference.
  3. What’s the hardware of yb-master servers?
  4. What’s the reason for partitioning here? Is it for scaling? Cause sharding is used for scaling.
    Or are these timeseries data that you just “drop partition” on old data?
  5. Can you provide an explain analyze of your slow query along with table schema & indexes?

Yes, we did, nothing helped.

The number of tablets on each server is about 10,000+

We have cluster consisting of 40 nodes (3 master nodes, 40 t-server nodes). Each machine is 64 CPU, 256 Gb RAM, 6 nvme disks (6.4 Tb each).

Because the size of a table is huge and partitioning should increase the performance.

no sec scans in query, so I guess there is no problem in queries themselves.

I’m pretty sure if we lower the number of tablets you will decrease overhead. Pick some small tables OR some over-partitioned/sharded tables, and for each tell me:

  1. Table & indexes schema
  2. Size of table in GB
  3. How reads/writes happen in this table (example: random writes, needs to use full cluster resources, etc).
  4. How many tablets & partitions does it have?

And based on the use case I can tell if we can lower the number of tablets.

So the yb-masters are on the same servers as the yb-tservers, correct? How have you deployed/started the db? Using yugabyted?

Please provide me the information so I can help you.

Dear @dorian_yugabyte

This is DDL of the table as an example:

CREATE TABLE public.stock (
s_w_id int4 NOT NULL,
s_i_id int4 NOT NULL,
s_quantity numeric(4) NULL,
s_ytd numeric(8, 2) NULL,
s_order_cnt int4 NULL,
s_remote_cnt int4 NULL,
s_data varchar(50) NULL,
s_dist_01 bpchar(24) NULL,
s_dist_02 bpchar(24) NULL,
s_dist_03 bpchar(24) NULL,
s_dist_04 bpchar(24) NULL,
s_dist_05 bpchar(24) NULL,
s_dist_06 bpchar(24) NULL,
s_dist_07 bpchar(24) NULL,
s_dist_08 bpchar(24) NULL,
s_dist_09 bpchar(24) NULL,
s_dist_10 bpchar(24) NULL,
CONSTRAINT stock_pkey PRIMARY KEY (s_w_id, s_i_id)
);

30,000 Gb

50% select queries, 50% update/insert queries

~1,500 tablets and no partitions since with partitions it works even slower.

Yes, you are right. We have manually build yugabyte from source, so no yugabyted.

explain analyze with 3Tb table with small number of tablets:

Index Scan using stock_pkey on stock s (cost=0.00..4.12 rows=1 width=1160) (actual time=2.059..2.064 rows=1 loops=1)
Index Cond: ((s_w_id = 17012) AND (s_i_id = 55))
Planning Time: 0.121 ms
Execution Time: 2.178 ms
Peak Memory Usage: 24 kB 

When the number of total tablets grows, the performance of queries fall. So we can not decrease the number of tablets, unfortunately. Since the number of tablets is proportional to number of tables.

EDIT: we have 43 nodes, 3 nodes for separate masters

@Mateo, can you share the output of explain (analyze, dist, debug) of the point lookup that takes long time? The above one you shared is with a small number of tablets.

Using the default tablet splitting values, you’d have 24 tablets per yb-tserver, 24*40=960. And this number would stay until you had 100GB tablets Tablet splitting | YugabyteDB Docs

You used “SPLIT INTO 1500” when creating this table, correct?

You can explain some other tables. Or we can lower the defaults and override more tablets only for tables that we know will grow big quickly.

@Meteo

For reference, can you explain:

  1. Version of yugabytedb you’re using (exact commit if custom build)
  2. What changes did you do to the db since you custom build & why?
  3. Can you paste any custom config you’ve set on yb-tserver & yb-master?
  4. What is the cluster RF?
  5. When you mean 1500 tablets, you mean BEFORE RF, correct?
  6. Can you explain why 2000 tables? Is this natural cause it’s just a big app? Or is it because of the partitioning? If partitioning, then please explain some scenarios so we can lower them too.
    Or do many of the 2000 tables have the same schema? Like creating 1 table for “type-of-thing”, to scale them?
  7. Regarding your previous thread, Support of 100.000+ connections, how many connections does each node have? Are you using Connection Manager in YSQL | YugabyteDB Docs ?
  8. When pasting an explain/analyze, please include the query too for clarity.