Yugabytedb select so slow

who can answer me, thanks.

I find yugabytedb select is so slow, when I select from testtable, it takes more 20 seconds to read 1244 records from table which has 3081000 records.

why is it so slow?

like bellow:

=# select count(id) from testtable;
count

3081000
(1 row)

=# select count(id) from testtable where id < 5;
count

1244
(1 row)

=# select * from testtable where id < 5;
firstname | lastname | logdatetime | status | id
--------------±------------±---------------------------±-------±—
MyFirstName3 | MyLastName3 | 2021-07-06 17:44:31.977793 | 4 | 4
MyFirstName0 | MyLastName0 | 2021-07-06 17:34:35.337026 | 1 | 1
MyFirstName0 | MyLastName0 | 2021-07-06 17:34:36.660008 | 1 | 1
MyFirstName2 | MyLastName2 | 2021-07-06 17:33:24.856881 | 3 | 3
MyFirstName0 | MyLastName0 | 2021-07-06 17:32:06.661087 | 1 | 1

Hi,
The first query is offloaded to the DocDB storage so this is fast. The second query has to retreive the rows to filter them and count them. To confirm this, please can you:
explain (analyze, summary false) select count(id) from testtable;
explain (analyze, summary false) select count(id) from testtable where id < 5;
to see the number of rows retreived by SeqScan
Thanks,
Franck

hi, it is timeout

=# explain (analyze, summary false) select count(id) from testtable;
ERROR: Timed out: [Timed out (yb/rpc/outbound_call.cc:512): Read RPC (request call id 411) to 172.20.0.3:9100 timed out after 59.982s, Timed out (yb/rpc/outbound_call.cc:512): Read RPC (request call id 412) to 172.20.0.3:9100 timed out after 59.982s]
=# explain (analyze, summary false) select count(id) from testtable where id < 5;
ERROR: Query error: Restart read required at: { read: { physical: 1625738844624642 } local_limit: { physical: 1625738844624642 } global_limit: in_txn_limit: serial_no: 0 }
=# select count(id) from testtable where id < 5;
count

9736
(1 row)

=#

Ok, timeout because it takes too long. Anyway, here the explanation:
https://dev.to/yugabyte/explain-analyze-on-count-pushdown-2978
If you want fast query for where id < 5; you need an index on id.
Example:

franck=> explain analyze select count(id) from testtable where id<5;

                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=105.00..105.01 rows=1 width=8) (actual time=15.867..15.867 rows=1 loops=1)
   ->  Seq Scan on testtable  (cost=0.00..102.50 rows=1000 width=4) (actual time=7.562..15.860 rows=4 loops=1)
         Filter: (id < 5)
         Rows Removed by Filter: 1996

franck=> create index i on testtable(id asc);
CREATE INDEX

franck=> explain analyze select count(id) from testtable where id<5;

                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5.15..5.16 rows=1 width=8) (actual time=1.162..1.162 rows=1 loops=1)
   ->  Index Only Scan using i on testtable  (cost=0.00..5.12 rows=10 width=4) (actual time=1.152..1.155 rows=4 loops=1)
         Index Cond: (id < 5)
         Heap Fetches: 0

That’s an Index Only Scan and even of the COUNT is not offloaded only the required rows are fetched.

Hi, I think 150357.808 ms is also so slow, how do you think? can I do anything to improve it.

=# explain analyze select count(id) from testtable where id<5;
QUERY PLAN

Aggregate (cost=105.00…105.01 rows=1 width=8) (actual time=150347.654…150347.654 rows=1 loops=1)
→ Seq Scan on testtable (cost=0.00…102.50 rows=1000 width=4) (actual time=40.498…150328.431 rows=12388 loops=1)
Filter: (id < 5)
Rows Removed by Filter: 14738612
Planning Time: 72.972 ms
Execution Time: 150357.808 ms
(6 rows)

=# \d testtable;
Table “public.testtable”
Column | Type | Collation | Nullable | Default
-------------±----------------------------±----------±---------±--------
firstname | character varying(30) | | |
lastname | character varying(30) | | |
logdatetime | timestamp without time zone | | |
status | integer | | |
id | integer | | |
Indexes:
“testtable_id_idx” lsm (id HASH)

=#

Hi I found it is so fast when I use the index asc which you recommand. Thanks.

=# explain analyze select count(id) from testtable where id<5;
QUERY PLAN



Aggregate (cost=5.15…5.16 rows=1 width=8) (actual time=62.721…62.721 rows=1 loops=1)
→ Index Only Scan using i on testtable (cost=0.00…5.12 rows=10 width=4) (actual time=33.108…62.549 rows=2348 loo
ps=1)
Index Cond: (id < 5)
Heap Fetches: 0
Planning Time: 77.860 ms
Execution Time: 64.936 ms
(6 rows)

=# select count(id) from testtable;
count

2991000
(1 row)

=# \d testtable;
Table “public.testtable”
Column | Type | Collation | Nullable | Default
-------------±----------------------------±----------±---------±--------
firstname | character varying(30) | | |
lastname | character varying(30) | | |
logdatetime | timestamp without time zone | | |
status | integer | | |
id | integer | | |
Indexes:
“i” lsm (id ASC)

=#

1 Like

Great. Now the response time is proportional to the number of rows to count and not the total size of the table :+1: