We have 6 node cluster, yugabyte 2.4.4 version
4 nodes in central region
2 nodes in east region
So, this implies 4 tablet servers in central region.
We have a single large YSQL table on yugabyte DB
CREATE TABLE sample (
supplier_id INT,
item_id INT,
supplier_name TEXT STATIC,
item_name TEXT,
PRIMARY KEY((supplier_id, item_id) HASH)
);
My understanding is, hash(yb_hash_code()) range for each tablet server will be decided based on number of tablet servers
-
How to know the hash range(number) for each tablet server in central region?
For a YSQL table(shown above), goal is to fetch all rows from each tablet parallely based on yb_hash_code() >= low and yb_hash_code() < high
How does the select query look like? to fetch all rows(from each tablet server)
@sham_yuga -
So in the case where you have 2 nodes in one of the central region, 2 nodes in the other, and 2 nodes in the east region, you have 3 complete copies of your data. If your tablet leaders are concentrated/pinned to the central region, then there is no need to worry about the location.
The simplest way to parallelize this is to do the following calculation:
65535/number_of_parallel_threads
In this case, 65535 represents the number of hash values possible in the system. So if we used 16 threads, then each range would be 4095 values. So your query would be:
1st thread:
SELECT * FROM sample WHERE yb_hash_code(supplier_id, item_id)>=0 AND yb_hash_code(supplier_id, item_id) < 4095);
2nd thread:
SELECT * FROM sample WHERE yb_hash_code(supplier_id, item_id)>=4095 AND yb_hash_code(supplier_id, item_id)<8190;
Continue until you reach 65535.
Hope this helps.
Alan
1 Like
@Alan_Caldera
In my setup, we have 4 tablet servers in central region. Leader is pinned to central region
My understanding is,
Tablet 1 has hash range [0,16384)
Tablet 2 has hash range [16384, 32768)
Tablet 3 has hash range [32768, 49152)
Tablet 4 has hash range [49152, 65535]
if my understanding(above) is correct, then I would launch 4 threads(of select query), because each query runs on only one tablet, based on the hash ranges(four chunks), Isn’t it?
Similarly, If there are 16 tablets in central region, then, as per this documentation
Tablet 1 has range [0x0000, 0x1000),
Tablet 2 has range [0x1000, 0x2000),
:
:
Tablet 16 has range [0xF000, 0xFFFF]
Correct me
@sham_yuga - That’s correct.
Alan
@Alan_Caldera
OK… I think concept of hash range distribution across tablets is same, irrespective of using YCQL or YSQL on yugabyteDB
why hash ranges in this querylink looking different than below range?
In the querylink, I was expecting four rows only in system.partitions
table, because there are four tablet servers in central region, as shown below
Tablet 1 hash range [0,16384)
Tablet 2 hash range [16384, 32768)
Tablet 3 hash range [32768, 49152)
Tablet 4 hash range [49152, 65535]
@sham_yuga –
You have 24 tablets because you have 6 nodes total. It looks like your system is set up to create 4 tablets per tablet-server. You would only see that kind of display if you had 6 nodes and you specified WITH TABLETS = 4
in the CREATE TABLE syntax. See this link in our documentation for a longer explanation: SpecifyTablesTableCreateTime
Alan
1 Like
@Alan_Caldera
I think they are 48 tablets but not 24
Because there are 48 rows in system.partitions
table
Correct me