Ysql high latency

I’m evaluating yugabyte on a very tiny database (a few Mb) with RF=3. One of the servers is ~25ms away from the others and I observe that the first query is always slower than if repeating the same query. What’s the cause for this?

For example, when I do “select count(0) from projects” it will be 75ms on the first attempt and 25ms on further attempts. Looks like the first query takes 3 round-trips to the “distant” server, but why is that the case?
Is there a way to warm up yugabyte so that this doesn’t happen with live queries?

More generally, do you have any documentation that can help understand the number of round-trips required per queries?
Some of my queries are timing out after 4s - I expect that it involves many roundtrips but I don’t really understand why. They are along the lines of “select * from projects where visibility=1 order by timestamp”. Yes, they use a full table scan based on EXPLAIN but there are only 26 rows in projects altogether..

FWIW the warmup problem is not only caused by the network latency. I moved all masters/tservers close to each other, but the first query still takes 200ms, while the others only 9ms. This applies separately for each table: for example warming up “projects” does not warm up “users”.

The initial query execution (especially the first “preparation” of the query plan for a statement on a TServer) needs to do some extra work to load the relevant information from the system catalog (e.g., to determine the table definitions, resolve non-fully qualified names using the search path order, etc.).

To understand why say your query on a 26 row table is taking 200ms for example on the first execute vs. subsequent executes taking 9ms etc. would you mind sharing the output of:

EXPLAIN (analyze,dist,debug) select ...

across those multiple executions? To read more about the dist and debug options to EXPLAIN see this section in the documentation.

Also, have you made any changes to GUC settings/gflags the govern the prefetch limit for scans. Can you confirm the output of:

SHOW ysql_prefetch_limit

Which version of YugabyteDB are you on?

Thanks for the quick reply!

The db version is: PostgreSQL 11.2-YB-2024.2.2.0-b0 on aarch64-unknown-linux-gnu, compiled by clang version 17.0.6 (GitHub - yugabyte/llvm-project: The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org. 9b881774e40024e901fc6f3d313607b071c08631), 64-bit

As for ysql_prefetch_limit, ysqlsh reports the following error:

ERROR:  unrecognized configuration parameter "ysql_prefetch_limit"

AFAICT I haven’t touched any GUC settings.

Please see the explain logs from the two executions below. This time it was 100ms vs 3ms, but if I run it from a follower it is more pronounced (560ms vs 7ms)

                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=102.50..102.51 rows=1 width=8) (actual time=2.116..2.116 rows=1 loops=1)
   ->  Seq Scan on projects  (cost=0.00..100.00 rows=1000 width=0) (actual time=2.100..2.102 rows=1 loops=1)
         Storage Table Read Requests: 1
         Storage Table Read Execution Time: 1.687 ms
         Storage Table Rows Scanned: 28
         Metric rocksdb_block_cache_hit: 5.000
         Metric rocksdb_block_cache_index_hit: 1.000
         Metric rocksdb_block_cache_data_hit: 3.000
         Metric rocksdb_block_cache_bytes_read: 89621.000
         Metric rocksdb_number_db_seek: 1.000
         Metric rocksdb_number_db_next: 28.000
         Metric rocksdb_number_db_seek_found: 1.000
         Metric rocksdb_number_db_next_found: 27.000
         Metric rocksdb_iter_bytes_read: 89737.000
         Metric rocksdb_block_cache_multi_touch_hit: 5.000
         Metric rocksdb_block_cache_multi_touch_bytes_read: 89621.000
         Metric docdb_keys_found: 28.000
         Metric ql_read_latency: sum: 106.000, count: 1.000
         Partial Aggregate: true
 Planning Time: 59.024 ms
 Execution Time: 2.340 ms
 Storage Read Requests: 1
 Storage Read Execution Time: 1.687 ms
 Storage Rows Scanned: 28
 Storage Write Requests: 0
 Catalog Read Requests: 121
 Catalog Read Execution Time: 62.990 ms
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Metric rocksdb_block_cache_hit: 5
 Metric rocksdb_block_cache_index_hit: 1
 Metric rocksdb_block_cache_data_hit: 3
 Metric rocksdb_block_cache_bytes_read: 89621
 Metric rocksdb_number_db_seek: 1
 Metric rocksdb_number_db_next: 28
 Metric rocksdb_number_db_seek_found: 1
 Metric rocksdb_number_db_next_found: 27
 Metric rocksdb_iter_bytes_read: 89737
 Metric rocksdb_block_cache_multi_touch_hit: 5
 Metric rocksdb_block_cache_multi_touch_bytes_read: 89621
 Metric docdb_keys_found: 28
 Metric ql_read_latency: sum: 106, count: 1
 Storage Execution Time: 64.677 ms
 Peak Memory Usage: 104 kB
(44 rows)

Time: 92.630 ms
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=102.50..102.51 rows=1 width=8) (actual time=2.092..2.092 rows=1 loops=1)
   ->  Seq Scan on projects  (cost=0.00..100.00 rows=1000 width=0) (actual time=2.079..2.082 rows=1 loops=1)
         Storage Table Read Requests: 1
         Storage Table Read Execution Time: 1.987 ms
         Storage Table Rows Scanned: 28
         Metric rocksdb_block_cache_hit: 5.000
         Metric rocksdb_block_cache_index_hit: 1.000
         Metric rocksdb_block_cache_data_hit: 3.000
         Metric rocksdb_block_cache_bytes_read: 89621.000
         Metric rocksdb_number_db_seek: 1.000
         Metric rocksdb_number_db_next: 28.000
         Metric rocksdb_number_db_seek_found: 1.000
         Metric rocksdb_number_db_next_found: 27.000
         Metric rocksdb_iter_bytes_read: 89737.000
         Metric rocksdb_block_cache_multi_touch_hit: 5.000
         Metric rocksdb_block_cache_multi_touch_bytes_read: 89621.000
         Metric docdb_keys_found: 28.000
         Metric ql_read_latency: sum: 211.000, count: 1.000
         Partial Aggregate: true
 Planning Time: 0.109 ms
 Execution Time: 2.153 ms
 Storage Read Requests: 1
 Storage Read Execution Time: 1.987 ms
 Storage Rows Scanned: 28
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Metric rocksdb_block_cache_hit: 5
 Metric rocksdb_block_cache_index_hit: 1
 Metric rocksdb_block_cache_data_hit: 3
 Metric rocksdb_block_cache_bytes_read: 89621
 Metric rocksdb_number_db_seek: 1
 Metric rocksdb_number_db_next: 28
 Metric rocksdb_number_db_seek_found: 1
 Metric rocksdb_number_db_next_found: 27
 Metric rocksdb_iter_bytes_read: 89737
 Metric rocksdb_block_cache_multi_touch_hit: 5
 Metric rocksdb_block_cache_multi_touch_bytes_read: 89621
 Metric docdb_keys_found: 28
 Metric ql_read_latency: sum: 211, count: 1
 Storage Execution Time: 1.987 ms
 Peak Memory Usage: 64 kB
(43 rows)

Time: 2.862 ms

@gabe

See the planning times, your first query has Planning Time: 59.024 ms while the second has Planning Time: 0.109 ms.

Execution time is the same on both cases ~2ms.

What is the hardware on each node?

One of the servers is ~25ms away from the others and I observe that the first query is always slower than if repeating the same query.

Please explain what’s the purpose of this server being 25ms away? Doesn’t really make sense unless you have some requirement that you haven’t specified.

In multi-region scenarios you want to set-preferred-zones yb-admin - command line tool for advanced YugabyteDB administration | YugabyteDB Docs

There are different things that apply.
In this case we have to understand why planning is so high.
Use prepared statements on the client?

The more complex the setup, the harder it will be to know what’s going on.

The architecture section of docs is a good start like:

“server leader/follower” is only for yb-master. Leaders & followers are per-tablet on yb-tservers. Unless it’s a colocated db and all your tables & indexes are in a single tablet.

See Key concepts | YugabyteDB Docs

That option has been deprecated in this commit. It’s a yb-tserver flag and not a PostgreSQL option (so you can see it on http://yb-tserver-ip:7000/varz).

In place of that we’ve added yb_fetch_row_limit which does show as PostgreSQL option:

yugabyte=# show yb_fetch_row_limit;
 yb_fetch_row_limit 
--------------------
 1024
(1 row)

@gabe

The other thing that I noted in your EXPLAIN PLAN is the high amount of catalog lookups:

Catalog Read Requests: 121
Catalog Read Execution Time: 62.990 ms

This significantly impacts the planning time and the overall runtime for the query. Every YSQL connection has to have catalog information that is located on the master leader. So, this is why you see a difference in a “follower” fault domain.
In your version of YugabyteDB, there is a helpful metric documented here: https://docs.yugabyte.com/stable/launch-and-manage/monitor-and-alert/metrics/cache-storage/#catalog-cache-misses

This should help identify additional tables that may need to be preloaded into the cache on each tserver to help mitigate the impact of the catalog fetches.

Hope this helps.
–Alan