Put some data into the same tablet in a specified manner

Hi
Excuse me.
If I use the hash sharding method and specify the id as the primary key,as shown in the following example:
CREATE TABLE tracking (id int PRIMARY KEY)

What method can I use to place records with ids such as 10,11,12,13 in the same tablet ?
Can I use split into? For example:
CREATE TABLE tracking (id int PRIMARY KEY) split into 1 tablet

Thanks.

Hi @ZhenNan2016 You can split tablets using range sharding to place records into the same tablet.

For example:

CREATE TABLE tracking (id INT, PRIMARY KEY (id asc)) SPLIT AT ((100));

The above will cause the table to be split at the given value and will let you insert all your specified values in the same tablet.

1 Like

Reference:

1 Like

Hi @vaibhav_yb

Yes, it’s a good idea.
However, I hope that the processing method of hash sharding also supports placing specified data in the same table tablet.
Do you have any good suggestions?
Thanks.

@ZhenNan2016

The only way to guarantee is by having a table like below and making sure they have the same partition_key column value:

CREATE TABLE sample(partition_key int, id int, PRIMARY KEY (partition_key, id));

@ZhenNan2016

Why is the requirement for both rows being in the same tablet?

Hi @dorian_yugabyte
Excuse me.
I can’t follow you, sorry.
Are k1 and k2 two column fields?
Is the partition_key a combination of k1 and k2 ?
Is this id field a normal column?
Is this the correct table creation statement as follow ?
CREATE TABLE(k1 int,k2 int,id int,PRIMARY KEY(k1,k2));

@dorian_yugabyte
Because it is necessary to test some operations on two records in the same tablet, including deadlock testing

@ZhenNan2016

I edited the query above, please check again.

Only partition_key column is used for partitioning.

1 Like

I got it now.
Thanks.

Hi @dorian_yugabyte
I’m sorry to trouble you.
The data is written to the data table according to the algorithm corresponding to hash-sharing.
What tools do we have to quickly calculate the hash value of the partition_key?
I want to see how a data record calculates the hash value of the partition_key and writes it to which tablet.
Thank you very much.

See yb_hash_code yb_hash_code() function [YSQL] | YugabyteDB Docs

1 Like

You can also use table partitioning that is inherited from PostgreSQL: Table partitioning | YugabyteDB Docs

1 Like

OK,I got it now.
Thank you very much.

Hi there!

You can achieve this by customizing the hash function used for sharding. By default, hash functions distribute data evenly across all shards, so assigning multiple records to the same shard would require modifying the hash function’s behavior.

One approach would be to create a custom hash function that takes the id value and maps it to a specific shard, regardless of the numerical value of id . For instance, you could create a function that always returns 0 for any id value. This would ensure that all records with any id value are placed in the same shard.

Here’s an example of how you might implement this in SQL:

SQL
CREATE FUNCTION custom_hash(id INT) RETURNS INT
BEGIN
  RETURN 0; -- Always return 0 to place all records in the same shard
END;

CREATE TABLE tracking (
  id INT PRIMARY KEY,
  -- Other columns...
)
PARTITION BY HASH(custom_hash(id));

This code defines a custom hash function custom_hash that always returns 0. It then creates a table tracking with a primary key id and partitions the table using the custom hash function. This ensures that all records with any id value are placed in the same shard.

Alternatively, you could use the split into clause to create a single-shard table, which would effectively place all records in the same shard. However, this approach might not be suitable for long-term scalability, as it would limit the ability to distribute data across multiple shards as the data volume grows.

In summary, the most appropriate method for placing records with specific IDs in the same shard depends on your specific requirements and considerations. If you need to ensure that all records with any id value are always placed in the same shard, regardless of the numerical value of id , then customizing the hash function is the most suitable approach. If you anticipate needing to scale the data across multiple shards in the future, then using the split into clause might not be the best choice.

1 Like

Hi @johnsmi4ri
I got it now.
Thank you very much.