Statistics of an Index

Karthikeyan · May 28, 2023, 12:58pm

Can we update statistics of the an index manually? Or is there way to figure out recent when the statistics was updated?

I have a table in yugabyte fairly smaller in size 250k rows, but when i run my query its not picking the index instead it goes for seq scan, i have re-created same table, index copied data to another table and the same query on the new table picks up the proper index and goes for index scan.

Any help here would be much appreaciated

dorian_yugabyte · May 29, 2023, 7:14am

Hi @Karthikeyan

Please explain your table layout + indexes, and an “explain analyze” of your query?

Karthikeyan · May 29, 2023, 8:44am

@dorian_yugabyte Thanks for your response. Below is my schema structure

CREATE TABLE public.posts (
	id uuid NOT NULL,
	title varchar(255) NOT NULL,
	is_private bool NOT NULL,
	member_id uuid NOT NULL,
	association_id uuid NOT NULL,
	association_type public.enum_posts_association_type NOT NULL,
	is_active public.enum_posts_is_active NULL DEFAULT 'inactive'::enum_posts_is_active,
	is_deleted timestamptz NULL,
	created_at timestamptz NULL DEFAULT now(),
	updated_at timestamptz NULL DEFAULT now(),
	description text NULL,
	comment_count int8 NULL DEFAULT 0,
	reaction_count int8 NULL DEFAULT 0,
	external_links jsonb NULL DEFAULT '{}'::jsonb,
	CONSTRAINT posts_pkey PRIMARY KEY (id)
);
CREATE INDEX ix_posts_associationid ON public.posts USING lsm (association_id HASH) WHERE ((is_active = 'active'::enum_posts_is_active) AND (is_deleted IS NULL));

**explain** **analyse**
**select** * **FROM** "posts" **AS** "post"
**WHERE** "post"."is_deleted" **IS** **null**
**and** "post".is_active = 'active'
**AND** "post"."association_type" = 'organization'
**AND** "post"."association_id" = '1a3ae463-03c3-414a-b540-a3d480b0c167'

Seq Scan on posts post  (cost=0.00..107.50 rows=1000 width=677) (actual time=21.506..2781.863 rows=5 loops=1)
  Filter: ((is_deleted IS NULL) AND (is_active = 'active'::enum_posts_is_active) AND (association_type = 'organization'::enum_posts_association_type) AND (association_id = '1a3ae463-03c3-414a-b540-a3d480b0c167'::uuid))
  Rows Removed by Filter: 269944
Planning Time: 0.050 ms
Execution Time: 2781.912 ms
Peak Memory Usage: 72 kB

I just copied the same data to another table with same schema structure and index using below statements

CREATE TABLE public.posts_tmp (
	id uuid NOT NULL,
	title varchar(255) NOT NULL,
	is_private bool NOT NULL,
	member_id uuid NOT NULL,
	association_id uuid NOT NULL,
	association_type public.enum_posts_association_type NOT NULL,
	is_active public.enum_posts_is_active NULL DEFAULT 'inactive'::enum_posts_is_active,
	is_deleted timestamptz NULL,
	created_at timestamptz NULL DEFAULT now(),
	updated_at timestamptz NULL DEFAULT now(),
	description text NULL,
	comment_count int8 NULL DEFAULT 0,
	reaction_count int8 NULL DEFAULT 0,
	external_links jsonb NULL DEFAULT '{}'::jsonb,
	CONSTRAINT posts_pkey_tmp PRIMARY KEY (id)
);
CREATE INDEX ix_posts_associationid_tmp ON public.posts_tmp USING lsm (association_id HASH) WHERE ((is_active = 'active'::enum_posts_is_active) AND (is_deleted IS NULL));

INSERT INTO posts_tmp
SELECT * FROM posts

Now with the new table it generated the better execution plan.

**explain** **analyse**
**select** * **FROM** posts_tmp "post"
**WHERE** "post"."is_deleted" **IS** **null**
**and** "post".is_active = 'active'
**AND** "post"."association_type" = 'organization'
**AND** "post"."association_id" = '1a3ae463-03c3-414a-b540-a3d480b0c167'

Index Scan using ix_posts_associationid_tmp on posts_tmp post  (cost=0.00..5.04 rows=10 width=677) (actual time=1.463..1.484 rows=5 loops=1)
  Index Cond: (association_id = '1a3ae463-03c3-414a-b540-a3d480b0c167'::uuid)
  Filter: (association_type = 'organization'::enum_posts_association_type)
Planning Time: 0.073 ms
Execution Time: 1.532 ms
Peak Memory Usage: 8 kB

dorian_yugabyte · May 29, 2023, 8:50am

Can you try analyzing the table ANALYZE statement [YSQL] | YugabyteDB Docs ?

Karthikeyan · May 29, 2023, 9:10am

Not much difference even after analysing the table.

FranckPachot · May 29, 2023, 9:52am

@Karthikeyan The best to understand a query planner decision is to show the cost of both plans and compare:

explain analyze
/*+ NoIndexScan(post) */
select * FROM "posts" AS "post"
WHERE "post"."is_deleted" IS null
and "post".is_active = 'active'
AND "post"."association_type" = 'organization'
AND "post"."association_id" = '1a3ae463-03c3-414a-b540-a3d480b0c167';

explain analyze
/*+ IndexScan(post) */
select * FROM "posts" AS "post"
WHERE "post"."is_deleted" IS null
and "post".is_active = 'active'
AND "post"."association_type" = 'organization'
AND "post"."association_id" = '1a3ae463-03c3-414a-b540-a3d480b0c167';

Note that using statistics by the planner is still in beta and you can enable with:

set yb_enable_optimizer_statistics=on

However, I don’t see how the same table with same data and same index and same query get a different plan. Are you sure the index is valid? you can see it with: \d posts

Karthikeyan · May 30, 2023, 4:27am

There seems to be difference in the cost on using IndexScan, but it not doing any indexScan, instead it goes for seqScan. I have tried enabling yb_enable_optimizer_statistics and analysed the table posts but not much help, since it shows rows updated as 0. I can solve this problem for now by copying over the data to a temp table and swap it but worried about the prod if it re-occurs again.

explain analyze
/*+ NoIndexScan(post) */
select * FROM “posts” AS “post”
WHERE “post”.“is_deleted” IS null
and “post”.is_active = ‘active’
AND “post”.“association_type” = ‘organization’
AND “post”.“association_id” = ‘1a3ae463-03c3-414a-b540-a3d480b0c167’;

QUERY PLAN |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Seq Scan on posts post (cost=0.00…4743.06 rows=2 width=677) (actual time=21.334…2815.430 rows=5 loops=1) |
Filter: ((is_deleted IS NULL) AND (is_active = ‘active’::enum_posts_is_active) AND (association_type = ‘organization’::enum_posts_association_type) AND (association_id = ‘1a3ae463-03c3-414a-b540-a3d480b0c167’::uuid))|
Rows Removed by Filter: 269946 |
Planning Time: 0.091 ms |
Execution Time: 2815.532 ms |
Peak Memory Usage: 147392 kB |

explain analyze
/*+ IndexScan(post) */
select * FROM “posts” AS “post”
WHERE “post”.“is_deleted” IS null
and “post”.is_active = ‘active’
AND “post”.“association_type” = ‘organization’
AND “post”.“association_id” = ‘1a3ae463-03c3-414a-b540-a3d480b0c167’;

QUERY PLAN |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Seq Scan on posts post (cost=10000000000.00…10000004743.06 rows=2 width=677) (actual time=22.723…2829.749 rows=5 loops=1) |
Filter: ((is_deleted IS NULL) AND (is_active = ‘active’::enum_posts_is_active) AND (association_type = ‘organization’::enum_posts_association_type) AND (association_id = ‘1a3ae463-03c3-414a-b540-a3d480b0c167’::uuid))|
Rows Removed by Filter: 269946 |
Planning Time: 0.094 ms |
Execution Time: 2829.824 ms |
Peak Memory Usage: 147656 kB |

Frits_Hoogland · May 30, 2023, 9:42am

Hi, can you please put the SQL and SQL output in a code block? A code block can be started and stopped with 3 backticks.

Something that we don’t understand is going on when a renamed version of the table with identical data and an identical index can use the index. If these are identical except for the name, they should perform the same plan.

Can you please show the description (/d) of both the tables?

Also, can you show the explain analyze output of the execution of the second table that does use the index?

FranckPachot · May 30, 2023, 12:39pm

This shows that the index is not usable. Maybe not valid. This can happen if the create index fails. You can list invalid indexes with:

 select pg_get_indexdef(indexrelid, 0, true) from pg_index where not indisvalid;

Karthikeyan · June 6, 2023, 5:41am

@FranckPachot Thanks for your response, yes i could see that index was not valid. Any idea why schema definition still shows that index is part of the table but on the other side the query that you had provided shows index is invalid?

FranckPachot · June 6, 2023, 12:13pm

which schema definition?
a \d in psql shows it. I’m not sure about DBeaver

Topic		Replies	Views
Query performance on table with indices General	10	1343	March 18, 2022
Does YugaByte SQL support partial indexes (indexing a subset of rows)? General	2	881	June 12, 2019
Index creation error General	2	286	February 8, 2024
Database schema in YugabyteDB for storing statistics collected periodically General	50	1915	November 21, 2022
Secondary Index and Select General	7	1575	April 13, 2018

Statistics of an Index

Related topics