Why are there already 8 tablets in the wals sub-directory under the data directory after a database cluster is initialized?

Alan_Caldera · July 22, 2024, 12:31pm

Zhen Nan –

So to “partially” pack a row – you can set the GFlag ysql_packed_row_size_limit to be the maximum size of the the “packed” row. So once the limit has been reached, all columns after that remain in the “unpacked” format.

–Alan

ZhenNan2016 · July 22, 2024, 2:13pm

Hi @Alan_Caldera
Setting the GFlag ysql_packed_row_size_limit to be the maximum size of the the “packed” row，there may be no way to accomplish the following realizations：
There are four columns A, B, C, and D. I only do packed for A, B, and C, and column D is stored separately.
e.g.
(hash1, [6,7], 8), system_column_id, T1 → [NULL]
(hash1, [6,7], 8), v1_column_id, T1 → 9
(hash1, [6,7], 8), v2_column_id, T1 → 10

packed row: <(hash1, [6,7], 8), packed { [NULL]，9}>
and non-packed: (hash1, [6,7], 8), v2_column_id, T1 → 10
Thanks a lot.

FranckPachot · July 24, 2024, 2:46am

Hi @ZhenNan2016 in tomorrow’s Community Open Hours (in 12 hours from now) we will explain a few questions picked from the forum.
There are some from you, so if you want to join in the chat it would be great

ZhenNan2016 · July 24, 2024, 6:54am

Thank you.
But online real time communication, my English is still weak and I may not be able to keep up.

ZhenNan2016 · July 24, 2024, 3:05pm

Hi @FranckPachot
The conditions are such that we can only see the video, not the sound.
Is there a replay video I can watch after the live stream?

FranckPachot · July 24, 2024, 5:28pm

Sure, it’s on Youtube: https://www.youtube.com/watch?v=kjlBXqiHeT0&ab_channel=Yugabyte

ZhenNan2016 · July 30, 2024, 7:48am

Hi @dorian_yugabyte @Alan_Caldera
Excuse me, one more question I need to check with you: when the ysql_enable_packed_row is closed, all the columns of a row of data, do they all have the same key? Does the key determine if they are all stored on the same node in the cluster? Or is it possible to store them across nodes? For example, if there are four columns, A, B, C, D, do all the columns of A, B, C, D exist on one node? Or is it possible that the four columns are stored on different nodes?
Thanks a lot.

dorian_yugabyte · July 30, 2024, 8:11am

They don’t have the same key. There’s a key for each column, which includes the column_id. You can see the full key that ends up in rocksdb here: DocDB data model | YugabyteDB Docs

The sharding columns determine on which tablet the whole row will reside.

No. All columns of a row will reside inside 1 tablet.

ZhenNan2016 · July 30, 2024, 8:59am

Hi @dorian_yugabyte
Thanks for your reply.
Previously Alan_Caldera mentioned: we built the packed row feature as an optimization to reduce the number of seeks and puts when reading and writing to the database.
Well, all columns of a row will reside inside 1 tablet if turn off packed rows via the GFlag (ysql_enable_packed_row). so, it’s really just an increase in seeks between multiple blocks when reading the data?

FranckPachot · July 30, 2024, 9:35am

Yes, what we call “seek” is the rocksdb seek, finding a key in the LSM Tree. It’s still within one tablet.

ZhenNan2016 · July 30, 2024, 9:51am

Okay, I got it now.
Thanks a lot.

ZhenNan2016 · July 31, 2024, 10:22am

Hi @dorian_yugabyte @FranckPachot
I have a question for you, which is described below:
The sql operation I am performing is as follows:
bigmath=# CREATE TABLE test3(id int PRIMARY KEY,descr text, age int);
CREATE TABLE
bigmath=# INSERT INTO test3 VALUES (30, ‘Phone X550’, 40), (31, ‘Tablet Z220’, 80);
INSERT 0 2
bigmath=# select * from test3;
id | descr | age
----±------------±----
30 | Phone X550 | 40
31 | Tablet Z220 | 80
(2 rows)

From the above execution result, it is successful. tserver’s log message can see that the “packed row” is successful:
T 36733a2f8a5e40fba97a8ce180ad06a7 P 7330ce09bc8546c39ebb29dce08435e1: Wrote 2 key/value pairs to kRegular RocksDB:
T 36733a2f8a5e40fba97a8ce180ad06a7 P 7330ce09bc8546c39ebb29dce08435e1 [R]: 1. PutCF: SubDocKey(DocKey(0xa5db, [30], ), [HT{ days: 19935 time: 18:04:46.717841 }]) => { 11: “Phone X550” 12: 40 }
T 36733a2f8a5e40fba97a8ce180ad06a7 P 7330ce09bc8546c39ebb29dce08435e1 [R]: 2. PutCF: SubDocKey(DocKey(0xee14, [31], ), [HT{ days: 19935 time: 18:04:46.717841 w: 1 }]) => { 11: “Tablet Z220” 12: 80 }

But executing the sst_dump command didn’t parse properly from the sst file:
./sst_dump --output_format=decoded_regulardb --command=scan --file=/home/bigmath/bigmath-data/node-1/disk-1/bm-data/dbserver/data/rocksdb/table-000034a2000030008000000000004305/tablet-36733a2f8a5e40fba97a8ce180ad06a7
from to
Process /home/bigmath/bigmath-data/node-1/disk-1/bm-data/dbserver/data/rocksdb/table-000034a2000030008000000000004305/tablet-36733a2f8a5e40fba97a8ce180ad06a7/000010.sst
Sst file format: block-based
SubDocKey(DocKey(0xa5db, [30], ), [HT{ physical: 1722420286717841 }]) → Not found (kv_debug.cc:125): No packing information available
SubDocKey(DocKey(0xee14, [31], ), [HT{ physical: 1722420286717841 w: 1 }]) → Not found (kv_debug.cc:125): No packing information available

May I ask, is this the wrong way to dump? Is there any other correct way?
Thanks a lot.

dorian_yugabyte · July 31, 2024, 10:30am

It doesn’t know how to read the packed rows. You need to set the metadata using --formatter_tablet_metadata=/home/bigmath/bigmath-data/node-1/disk-1/bm-data/dbserver/tablet-meta/36733a2f8a5e40fba97a8ce180ad06a7

See What's the equivalent of pageinspect in YugabyteDB? - DEV Community as example

ZhenNan2016 · July 31, 2024, 11:59am

It’s great. Thank you very much.

Topic		Replies	Views
Database schema in YugabyteDB for storing statistics collected periodically General	50	1947	November 21, 2022
Facing strange behaviour while checking ingestion rate in ysql table General	11	596	September 6, 2023
How is data stored internally in YugaByte General	1	2434	March 25, 2019
How to store count of unread messages? General	39	6807	July 15, 2021
How's TTL performance General	9	3422	October 27, 2019

Why are there already 8 tablets in the wals sub-directory under the data directory after a database cluster is initialized?

Related topics