How to Optimize YugabyteDB for Better Performance

Hii guys! :smiling_face_with_three_hearts:

I am new to YugabyteDB and I am trying to use it for a project that needs to be really fast and reliable. I am having trouble getting the best performance out of it.

Can someone help me learn how to make YugabyteDB work as fast as possible? I had like to know about:

  • Settings: Which settings should I change to make it faster? Are there any really important ones?
  • Table Design: How should I set up my tables and indexes so that they work well?
  • Queries: What’s the best way to write fast queries? What should I avoid?
  • Hardware: What kind of computers should I use for YugabyteDB? Is there a best setup?
  • Checking for Problems: How can I tell if YugabyteDB is slow? What should I look for?

I also check this: https://forum.yugabyte.com/t/which-one-has-better-write-performancesalesforce-developer-between-yugabytedb-cluster-and-single-node/2024 But I have not found any solution. Please suggest any advice would be really helpful for me and other people who are new to YugabyteDB.

Thanks in advance! :blush:

Respected community member :saluting_face:

Hi Eden,

I’ll try to answer quickly but to help the best it is important to know:

  • is it a new application or one migrated from PostgreSQL or other?
  • is it the kind of traditional SQL application with 100+ tables, or more like microservices with few tables and limited access patterns?
  • by best performance, do you mean higher throughput (more transactions per seconds) or lower response time

Taking your questions one by one:

  • Settings
    Being distributed, the database is complex, and there’s no single configuration that fits all. The best to start is with yugabyted and --enable_pg_parity_tech_preview=true which sets multiple parameters like:
    yb_enable_base_scans_cost_model=true and yb_enable_optimizer_statistics=true to get the cost-based optimization on ANALYZEd tables
    yb_bnl_batch_size=1024 to optimize nested loops
    yb_use_hash_splitting_by_default=false to not apply a hash function on keys so that indexes can be used for range queries

  • Table Design

If you have a lot of small tables that will not grow, put them colocated
The others will start with one tablet (thanks to range sharding) and will be split automatically.

  • Queries

The fastest access is by primary key. When accessing by a secondary index, having all columns in the index is good when accessing to many rows. You see an Index Only Scan in the execution plan. With the cost-based optimizer and tables analyzed (there’s no auto-analyze yet) the join order is often the good one. If not, it is possible to change the query by using a CTE, or using pg_hint_plan hints.

  • Checking for Problems
    If you identified slow queries, explain (analyze, dist, debug, verbose) will show all the information needed to understand them. Don’t hesitate to share it here.
    If you didn’t, pg_stat_statements is installed and can be used to find slow queries like in PostgreSQL (except that the view is per node)
    There are also many system metrics available. How did you deploy it—on-premises install (Linux, Docker, Kubernetes) or cloud (our managed service)?

Also, we have many resources like docs or videos. If you still have a problem, we can do a quick zoom if you want, but better look at those points first

The best way is to “just” know how everything works underneath in detail in all layers of your app + rdbms + our flavor of distributed rdbms + linux + networking.

There are several ways:

I already gave them the solution. Can you be more specific?