This is the index definition of person table.
CREATE** INDEX person_search_key_idx ON public.person USING lsm (sk HASH) INCLUDE (pk, entity_group, entity) WHERE (entity = ‘search_key’**::**text)
On this table executing below query with having list of sk in IN clause.
select pk,sk from person where entity = ‘search_key’ and sk in ( ‘1’,‘2’,‘3’) and entity_group**=**‘person’;
Will the above query be performant. Its distributed db and sk is the hash for sharding. Isit better to go with IN clause with 50 keys or executing single select for each sk separately?
What will be the impact if the SK is equally distributed in different nodes vs all the sk are relatively in single node.
Hi @banandh
Please paste the output of \d+ table_name
.
Also please paste the output of
EXPLAIN (ANALYZE, DIST) select ...
And what version are you using?
Hi @banandh, Yes, having an IN clause rather than multiple queries is better because we do an index skip scan, reading the three ranges in one scan.
However, your index will scan the three ranges and then filter out for entity_group='person'
(Storage Filter
in explain) so it will not be performant if there are many other values of entity_group
The following index will be better as it will get the smaller ranges that include only the desired entity group:
create index person_search_key_idx on person ( sk hash, entity_group)
include (pk, entity)
where entity ='search_key'
;
With explain
you should see sk in ( '1','2','3') and entity_group='person'
as Index Cond
so you scan only the rows that you need.