Poor Performance of Aggregate queries

Karthikeyan · August 30, 2023, 6:17am

I am trying to find aggregates by joining couple of smaller tables which are 100k records each. I have necessary indexes created on the columns which i am trying to query, but execution time of the query seems to be very odd. Please find the table definitions and query execution plan as well, i have tried with queryhint of hash join as well but no help.

Can you please help here?

staging=# \d users;
                                                Table "public.users"
      Column       |             Type             | Collation | Nullable |                 Default                 
-------------------+------------------------------+-----------+----------+------------------------------------------
 id                | uuid                         |           | not null |
 email             | character varying(255)       |           | not null |
 phone             | character varying(255)       |           |          |
 password          | character varying(255)       |           | not null |
 registered_from   | character varying(255)       |           | not null |
 status            | enum_users_status            |           |          | 'inactive'::enum_users_status
 is_email_verified | enum_users_is_email_verified |           |          | 'inactive'::enum_users_is_email_verified
 is_phone_verified | enum_users_is_phone_verified |           |          | 'inactive'::enum_users_is_phone_verified
 is_deleted        | timestamp with time zone     |           |          |
 created_at        | timestamp with time zone     |           |          | now()
 updated_at        | timestamp with time zone     |           |          | now()
 status_changed_at | timestamp with time zone     |           |          |
 status_changed_by | uuid                         |           |          |
Indexes:
    "users_pkey" PRIMARY KEY, lsm (id HASH)
    "users_email" UNIQUE, lsm (email HASH)
    "users_phone" UNIQUE, lsm (phone HASH)
    "ix_users_email" lsm (email HASH) WHERE status = 'active'::enum_users_status AND is_deleted IS NULL
    "user_status" lsm (status HASH, id ASC)

staging=# \d profiles;
                                                        Table "public.profiles"
    Column     |           Type           | Collation | Nullable |                               Default                               
---------------+--------------------------+-----------+----------+----------------------------------------------------------------------
 user_id       | uuid                     |           | not null |
 first_name    | character varying(255)   |           | not null |
 middle_name   | character varying(255)   |           |          |
 last_name     | character varying(255)   |           | not null |
 date_of_birth | timestamp with time zone |           |          |
 gender        | enum_profiles_gender     |           |          |
 address_info  | json                     |           |          | '{"street":"","city":"","state":"","country":"","zipcode":""}'::json
 country_code  | character varying(255)   |           | not null |
 created_at    | timestamp with time zone |           |          | now()
 updated_at    | timestamp with time zone |           |          | now()
 picture       | integer                  |           |          |
Indexes:
    "profiles_pkey" PRIMARY KEY, lsm (user_id HASH)


staging=# select count(1) from users;
 count
-------
 99552
(1 row)

staging=# select count(1) from profiles;
 count
-------
 99552
(1 row)

staging=# /*+HashJoin("profile","user")*/
staging-# EXPLAIN ANALYZE SELECT count(1) AS "count" FROM "users" AS "user" INNER JOIN "profiles" AS "profile" on "profile"."user_id" = "user"."id" where "user"."status" = 'active';
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=30.50..30.51 rows=1 width=8) (actual time=78266.507..78266.507 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..30.25 rows=100 width=0) (actual time=4.454..78204.609 rows=99535 loops=1)
         ->  Index Only Scan using user_status on users "user"  (cost=0.00..15.25 rows=100 width=16) (actual time=3.390..154.580 rows=99535 loops=1)
               Index Cond: (status = 'active'::enum_users_status)
               Heap Fetches: 0
         ->  Index Scan using profiles_pkey on profiles profile  (cost=0.00..0.15 rows=1 width=16) (actual time=0.772..0.772 rows=1 loops=99535)
               Index Cond: (user_id = "user".id)
 Planning Time: 0.217 ms
 Execution Time: 78266.585 ms
 Peak Memory Usage: 18014398509468096 kB
(10 rows)

staging=# DROP INDEX user_status;
DROP INDEX

staging=# EXPLAIN ANALYZE SELECT count(1) AS "count" FROM "users" AS "user" INNER JOIN "profiles" AS "profile" on "profile"."user_id" = "user"."id" where "user"."status" = 'active';
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=218.89..218.90 rows=1 width=8) (actual time=76854.824..76854.824 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..216.39 rows=1000 width=0) (actual time=6.308..76794.314 rows=99535 loops=1)
         ->  Seq Scan on users "user"  (cost=0.00..102.50 rows=1000 width=16) (actual time=5.088..142.942 rows=99535 loops=1)
               Filter: (status = 'active'::enum_users_status)
               Rows Removed by Filter: 17
         ->  Index Scan using profiles_pkey on profiles profile  (cost=0.00..0.11 rows=1 width=16) (actual time=0.758..0.758 rows=1 loops=99535)
               Index Cond: (user_id = "user".id)
 Planning Time: 22.782 ms
 Execution Time: 76855.960 ms
 Peak Memory Usage: 18014398509468840 kB
(10 rows)

Jim_Knicely · August 30, 2023, 1:20pm

Hi! Which version of YB are you running? Is this a multi-node cluster?

Maybe try turning off nested loops? This will force a HASH join.

SET enable_nestloop = off;

Or, you can try batch nested loops (a newer feature):

SET yb_bnl_batch_size=1024;

Reference: Batched Nested Loop to reduce read requests to the distributed storage

FranckPachot · August 30, 2023, 6:44pm

Hi,

/+HashJoin("profile","user")/

This doesn’t work. The separator is a space, not a comma, and the *
This should force a Hash Join with user as the hash table:

/*+ HashJoin("profile" "user") Leading( ("profile" "user") ) */

But as @Jim_Knicely mentioned, Batched Nested Loop can probably make the Nested Loop acceptable.

Karthikeyan · September 2, 2023, 7:37pm

Do we need to set the batch size only on the leader node or every master node? does this needs cluster restart?

SET yb_bnl_batch_size=1024;

SET enable_nestloop = off;

Jim_Knicely · September 3, 2023, 12:05am

Hi! Did you try @FranckPachot’s suggestion with the hints?

/*+ HashJoin("profile" "user") Leading( ("profile" "user") ) */

For SET yb_bnl_batch_size=1024; and SET enable_nestloop = off; , those are session level settings. So they would live until you log out of the session. They would only need to be set on the node in which you are connected (which I refer to as the initiator node )

You can also set these parameters at the user, database and cluster level. But you may not want to do that as it will affect all queries at a less granular level than setting it at the session level.

What version of YB are you running? The yb_bnl_batch_size param is available in 2.18 and up.

Jim_Knicely · September 3, 2023, 12:13am

By the way, you don’t want to use SET yb_bnl_batch_size=1024; and SET enable_nestloop = off; together! The yb_bnl_batch_size=1024 enables “batch” nested loops, while enable_nestloop = off would disable nested loops!

Karthikeyan · September 3, 2023, 5:59am

I am using Yugabyte 2.15.1.0 and cluster configuration of three node cluster with all three being master. Size of each node is CX.Large.

I am seeing this nested loop issue at the other places so just want to know if we can set the batch size at the cluster level rather than at the session level

dorian_yugabyte · September 4, 2023, 7:42am

The hardware is very small to be used for benchmarks for olap queries. I’d suggest to use better hardware, closer to Deployment checklist for YugabyteDB clusters | YugabyteDB Docs

Topic		Replies	Views
Poor performance for aggregation queries on sample data General	7	1345	June 26, 2020
Query performance on table with indices General	10	1367	March 18, 2022
pgAdmin poor performance General	3	838	August 24, 2020
Yugabytedb select so slow Design Discussions	6	905	July 9, 2021
Yugabyte vs MySQL General	13	3806	October 2, 2019

Poor Performance of Aggregate queries

Related topics