Put some data into the same tablet in a specified manner

ZhenNan2016 · March 29, 2023, 4:10am

Hi
Excuse me.
If I use the hash sharding method and specify the id as the primary key，as shown in the following example:
CREATE TABLE tracking (id int PRIMARY KEY)

What method can I use to place records with ids such as 10,11,12,13 in the same tablet ?
Can I use split into? For example:
CREATE TABLE tracking (id int PRIMARY KEY) split into 1 tablet

Thanks.

vaibhav_yb · March 29, 2023, 4:31am

Hi @ZhenNan2016 You can split tablets using range sharding to place records into the same tablet.

For example:

CREATE TABLE tracking (id INT, PRIMARY KEY (id asc)) SPLIT AT ((100));

The above will cause the table to be split at the given value and will let you insert all your specified values in the same tablet.

vaibhav_yb · March 29, 2023, 4:31am

Reference:

ZhenNan2016 · March 29, 2023, 5:14am

Hi @vaibhav_yb

Yes, it’s a good idea.
However, I hope that the processing method of hash sharding also supports placing specified data in the same table tablet.
Do you have any good suggestions?
Thanks.

dorian_yugabyte · March 29, 2023, 5:46am

@ZhenNan2016

The only way to guarantee is by having a table like below and making sure they have the same partition_key column value:

CREATE TABLE sample(partition_key int, id int, PRIMARY KEY (partition_key, id));

dorian_yugabyte · March 29, 2023, 6:07am

@ZhenNan2016

Why is the requirement for both rows being in the same tablet?

ZhenNan2016 · March 29, 2023, 6:12am

Hi @dorian_yugabyte
Excuse me.
I can’t follow you, sorry.
Are k1 and k2 two column fields?
Is the partition_key a combination of k1 and k2 ?
Is this id field a normal column?
Is this the correct table creation statement as follow ?
CREATE TABLE（k1 int，k2 int，id int，PRIMARY KEY(k1，k2));

ZhenNan2016 · March 29, 2023, 6:14am

@dorian_yugabyte
Because it is necessary to test some operations on two records in the same tablet, including deadlock testing

dorian_yugabyte · March 29, 2023, 6:24am

@ZhenNan2016

I edited the query above, please check again.

Only partition_key column is used for partitioning.

ZhenNan2016 · March 29, 2023, 6:38am

I got it now.
Thanks.

ZhenNan2016 · March 30, 2023, 8:26am

Hi @dorian_yugabyte
I’m sorry to trouble you.
The data is written to the data table according to the algorithm corresponding to hash-sharing.
What tools do we have to quickly calculate the hash value of the partition_key?
I want to see how a data record calculates the hash value of the partition_key and writes it to which tablet.
Thank you very much.

dorian_yugabyte · March 30, 2023, 12:20pm

See yb_hash_code yb_hash_code() function [YSQL] | YugabyteDB Docs

dorian_yugabyte · March 30, 2023, 12:20pm

You can also use table partitioning that is inherited from PostgreSQL: Table partitioning | YugabyteDB Docs

ZhenNan2016 · March 30, 2023, 12:46pm

OK，I got it now.
Thank you very much.

johnsmi4ri · November 28, 2023, 12:06pm

Hi there!

You can achieve this by customizing the hash function used for sharding. By default, hash functions distribute data evenly across all shards, so assigning multiple records to the same shard would require modifying the hash function’s behavior.

One approach would be to create a custom hash function that takes the id value and maps it to a specific shard, regardless of the numerical value of id . For instance, you could create a function that always returns 0 for any id value. This would ensure that all records with any id value are placed in the same shard.

Here’s an example of how you might implement this in SQL:

SQL
CREATE FUNCTION custom_hash(id INT) RETURNS INT
BEGIN
  RETURN 0; -- Always return 0 to place all records in the same shard
END;

CREATE TABLE tracking (
  id INT PRIMARY KEY,
  -- Other columns...
)
PARTITION BY HASH(custom_hash(id));

This code defines a custom hash function custom_hash that always returns 0. It then creates a table tracking with a primary key id and partitions the table using the custom hash function. This ensures that all records with any id value are placed in the same shard.

Alternatively, you could use the split into clause to create a single-shard table, which would effectively place all records in the same shard. However, this approach might not be suitable for long-term scalability, as it would limit the ability to distribute data across multiple shards as the data volume grows.

In summary, the most appropriate method for placing records with specific IDs in the same shard depends on your specific requirements and considerations. If you need to ensure that all records with any id value are always placed in the same shard, regardless of the numerical value of id , then customizing the hash function is the most suitable approach. If you anticipate needing to scale the data across multiple shards in the future, then using the split into clause might not be the best choice.

ZhenNan2016 · November 28, 2023, 12:18pm

Hi @johnsmi4ri
I got it now.
Thank you very much.

Topic		Replies	Views
Does CREATE TABLE syntax need HASH keyword explicitly? General	7	826	May 6, 2022
Avoiding hot-spots on timestamp range indexes? General	1	1017	August 4, 2020
How to shard tables for efficient join Design Discussions	16	367	October 2, 2024
Tablets are not splitted evenly General	35	794	April 19, 2024
The relationship between parent tablets and child tablets? General	6	408	September 25, 2023

Put some data into the same tablet in a specified manner

Related topics