Poor performance for aggregation queries on sample data

Hi,

I am completely new to Yugabyte and have just downloaded the latest docker image for a test.

Following the quick start guide for docker I ran the aggregation test query:

yb_demo=# SELECT source, ROUND(SUM(orders.total)) AS total_sales
          FROM users, orders WHERE users.id=orders.user_id
          GROUP BY source
          ORDER BY total_sales DESC;

On my test system (a laptop with 16G running Ubuntu 20.04) the query took more than 12 seconds. I use the same test data on a postgresql instance on the same system, and the query took only 0.1s.

How is this huge performance difference to be explained? Surely something is wrong with my setup but I have just followed the quick start guide.

Hi @Yuan_Hong

I am cc’ing @mihnea, @neha here who can answer your question better.

@Yuan_Hong
Can you paste your table/indexes schema ?

Also, can you try with a colocated database ?

@dorian_yugabyte

I was following the quick start guide:

And all schema / data come exactly from that guide. And the aggregation query is the one from section 6 of the guide.

P.S. I just tried the demo with with colocated = true and the query took slightly longer and returned the result in 13.5-15.5s.

Hi @Yuan_Hong

I just tried the same with my setup (16gb laptop, ubuntu 18.04, i7,ssd). The aggregate query finished in 3-4 seconds. Are you testing from ysqlsh with \timing ? If not, can you try that and report back ?

yb_colocated=# SELECT source, ROUND(SUM(orders.total)) AS total_sales
yb_colocated-#           FROM users, orders WHERE users.id=orders.user_id
yb_colocated-#           GROUP BY source
yb_colocated-#           ORDER BY total_sales DESC;
  source   | total_sales 
-----------+-------------
 Facebook  |      333454
 Google    |      325184
 Organic   |      319637
 Twitter   |      319449
 Affiliate |      297605
(5 rows)

Time: 3980.298 ms (00:03.980)

Hi @dorian_yugabyte

My latest test run with \timing turned on:

yb_demo=# SELECT source, ROUND(SUM(orders.total)) AS total_sales
yb_demo-#           FROM users, orders WHERE users.id=orders.user_id
yb_demo-#           GROUP BY source
yb_demo-#           ORDER BY total_sales DESC;
SELECT source, ROUND(SUM(orders.total)) AS total_sales
          FROM users, orders WHERE users.id=orders.user_id
          GROUP BY source
          ORDER BY total_sales DESC;
  source   | total_sales 
-----------+-------------
 Facebook  |      333454
 Google    |      325184
 Organic   |      319637
 Twitter   |      319449
 Affiliate |      297605
(5 rows)

Time: 10270.887 ms (00:10.271)

A few observations:

  • I am running yugabyte from docker image. Maybe this could make a difference?

  • the same query when run from pgAdmin4 console took consistently longer than from ysql:

Anyway, even 4 seconds for such a simple query is beyond my expectation. As previously noted, it took 100ms on postgresql. And I just did a test on cockroachdb with 3 nodes and it took 150ms. All tests are performed under docker on the same laptop which has AMD Ryzen 5 3500U and 16GB memory with 512G SSD.

Hi,

I am attaching the result of EXPLAIN ANALYZE on my laptop run from pgadmin4 and hope it helps to find the cause of the performance problem:

Sort  (cost=229.03..229.53 rows=200 width=40) (actual time=12372.031..12372.032 rows=5 loops=1)
  Sort Key: (round(sum(orders.total))) DESC
  Sort Method: quicksort  Memory: 25kB
  ->  HashAggregate  (cost=218.89..221.39 rows=200 width=40) (actual time=12372.013..12372.017 rows=5 loops=1)
        Group Key: users.source
        ->  Nested Loop  (cost=0.00..213.89 rows=1000 width=40) (actual time=17.378..12270.460 rows=18760 loops=1)
              ->  Foreign Scan on orders  (cost=0.00..100.00 rows=1000 width=16) (actual time=16.708..76.350 rows=18760 loops=1)
              ->  Index Scan using users_pkey on users  (cost=0.00..0.11 rows=1 width=40) (actual time=0.588..0.588 rows=1 loops=18760)
                    Index Cond: (id = orders.user_id)
Planning Time: 6.541 ms
Execution Time: 12664.493 ms

Hi @Yuan_Hong,

Thanks for reporting this. We have some inefficiencies in the way we handle large table joins - for example, [YSQL] Improve the performance of NestedLoop joins with Index Scans using batched execution · Issue #4903 · yugabyte/yugabyte-db · GitHub

Once the above issue is fixed, you should see the performance improve significantly.