Low TPS when I use sysbench to benchmark yugabytedb

When I use sysbench to benchmark test yugabytedb, the tps is very low.
There are 10 tables, each with 1 million records.
Three-node cluster, each node configured with CPU as E5-2630 v4 and 128GB memory.
For oltp_read_write, the tps is only single digits.
For oltp_read_only, the tps is also only single digits.
For oltp_write_only, the tps can reach 5000+.
How can I improve the tps for oltp_read_write, or is this a bug in yugabytedb?

it looks like it’s a configuration issue?

Configuration Optimization:

  • Ensure your YugabyteDB configuration is optimized for your hardware. Check parameters like max_connections, shared_buffers, effective_cache_size, and work_mem
  • Adjust ysql_num_shards_per_tserver to match your CPU core count.
1 Like

@franzli347 - Check the Deployment Checklist for hardware requirements: Deployment checklist for YugabyteDB clusters | YugabyteDB Docs

And make sure that ulimits are configured:

1 Like

I have little experience with Yugabyte and much experience with MySQL+sysbench
Your results are confusing so it would be great to figure them out.
I assume you used the default value for --range-size (by not specifying it on the command line) and the default appears to be 100.
https://github.com/akopytov/sysbench/blob/master/src/lua/oltp_common.lua#L36-L37

On a small server I have at home, a database that fits in the InnoDB buffer pool, InnoDB and sysbench run with one thread (–threads=1, thread == database connection) I get the following, although I use my fork of sysbench which is slightly different from upstream.

  • ~8000 TPS for oltp_read_only
  • ~7400 TPS for oltp_write_only
  • ~7400 TPS for oltp_read_write

How is the TPS for oltp_read_only with --range-selects=false? It is true by default. This will help understand whether the perf problem is from range or point queries.
If the problem turns out to be the range queries, you can figure out which one by reducing some of these options to 0:
https://github.com/akopytov/sysbench/blob/master/src/lua/oltp_common.lua#L42-L49

As those options control how many times each of these queries is executed per transaction (the default is 1).
https://github.com/akopytov/sysbench/blob/master/src/lua/oltp_common.lua#L255-L266

And finally you can also modify the length of the range query via --range-size
https://github.com/akopytov/sysbench/blob/master/src/lua/oltp_common.lua#L36-L37

oltp_read_only and oltp_write_only have mutually exclusive workloads (from a source code perspective)
oltp_read_write combines oltp_read_only and oltp_write_only workloads
I assume this means that your perf problem comes from oltp_read_only as it is a subset of oltp_read_write

The main loops for the workloads are:
https://github.com/akopytov/sysbench/blob/master/src/lua/oltp_write_only.lua#L40-L42
https://github.com/akopytov/sysbench/blob/master/src/lua/oltp_read_only.lua#L45-L52
https://github.com/akopytov/sysbench/blob/master/src/lua/oltp_read_write.lua#L49-L60

2 Likes

@franzli347, tables are created with hash sharding on the first column of the keys by default. This is not ideal for range queries because the index can only be used if there is equality for the first column. I recommend setting yb_use_hash_splitting_by_default to off before creating the tables and indexes. This will result in indexes being ASC by default on all columns, allowing them to serve range queries.

FYI here are the modifications made to Sysbench to run efficiently on YugabyteDB:

Mainly:

  • Use serial columns with a higher cache to avoid the sequence from becoming a hotspot.
  • Use range sharding rather than the default hash sharding
  • Pre-split the tables instead of letting them split while being loaded.
  • Connect to all hosts, as all hosts are active. Each node is a PostgreSQL endpoint.
  • Create indexes before the load; they are LSM trees, and do not suffer from B-Tree write amplification. Creating before avoids re-reading the table.
  • Use single-thread DDL for the preparation phase to avoid concurrent DDL.
  • Utilize multi-value inserts for better batching of write operations.
  • Make use of prepared statements to reduce catalog reads, which can be remote.
  • Implement more workloads such as oltp_multi_insert and idle connection.
1 Like

Thank you for your answer. I was indeed able to solve the problem through yb_use_hash_splitting_by_default.

1 Like