Performance of IN clause in select query

banandh · October 29, 2024, 5:42am

This is the index definition of person table.

CREATE** INDEX person_search_key_idx ON public.person USING lsm (sk HASH) INCLUDE (pk, entity_group, entity) WHERE (entity = ‘search_key’**::**text)

On this table executing below query with having list of sk in IN clause.

select pk,sk from person where entity = ‘search_key’ and sk in ( ‘1’,‘2’,‘3’) and entity_group**=**‘person’;

Will the above query be performant. Its distributed db and sk is the hash for sharding. Isit better to go with IN clause with 50 keys or executing single select for each sk separately?

What will be the impact if the SK is equally distributed in different nodes vs all the sk are relatively in single node.

dorian_yugabyte · October 29, 2024, 5:46am

Hi @banandh

Please paste the output of \d+ table_name.

Also please paste the output of

EXPLAIN (ANALYZE, DIST) select ...

And what version are you using?

FranckPachot · October 29, 2024, 7:46am

Hi @banandh, Yes, having an IN clause rather than multiple queries is better because we do an index skip scan, reading the three ranges in one scan.

However, your index will scan the three ranges and then filter out for entity_group='person' (Storage Filter in explain) so it will not be performant if there are many other values of entity_group

The following index will be better as it will get the smaller ranges that include only the desired entity group:

create index person_search_key_idx on person ( sk hash, entity_group)
 include (pk, entity)
 where entity ='search_key'
;

With explain you should see sk in ( '1','2','3') and entity_group='person' as Index Cond so you scan only the rows that you need.

Topic		Replies	Views
Poor Performance of Aggregate queries General	7	505	September 4, 2023
Timeouts on 16 node cluster due to bad schema design Design Discussions	11	1007	January 26, 2022
Secondary Index and Select General	7	1570	April 13, 2018
Query performance on table with indices General	10	1338	March 18, 2022
Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc General	5	816	March 24, 2020

Performance of IN clause in select query

Related topics