Will a data table and its index table split simultaneously

ZhenNan2016 · May 23, 2024, 9:15am

Hi.
Excuse me, when creating an index for one field in a data table, does a node of the cluster only fill in the index for local tablet data of the data table? Or rather, the data filling of the index is cross node?
When the data of this data table continues to increase and needs to be split into multiple tablets, will the index table split along with it?
Does the sharded data of the split index correspond to the sharded data of the data table’s tablet?

dorian_yugabyte · May 23, 2024, 9:33am

Hi @ZhenNan2016

The table & index locations for a given row can be on different nodes of the cluster (except colocated db where they’re together).

No. The tables/indexes are split into “tablets”. The tablets split on their own.

No. Think of an index as a separate table that gets inserts/deletes from the main table. The splitting logic is separate.

ZhenNan2016 · May 23, 2024, 9:44am

Hi @dorian_yugabyte
I have a requirement now that I can build my own index data on the main table data of this node, without crossing nodes. This way, when searching for data, it can be locked on the same node, reducing the performance overhead caused by cross location. For example, if the tablet data of the main table on the node A is 1-100 rows of data, then the index data on the node A also belongs to the index data of these 1-100 rows.
Is there any good way to support this requirement?

ZhenNan2016 · May 23, 2024, 9:54am

@dorian_yugabyte
From the following online-index-backfill.md document, it appears that a node of the cluster only fill in the index for local tablet data of the data table?

dorian_yugabyte · May 23, 2024, 10:09am

This is not possible, see Co-partitioned tables · Issue #79 · yugabyte/yugabyte-db · GitHub

You can create multiple colocated dbs as an example, but that overheads that you can’t share connections, can’t join/transact data across dbs, have to handle DDL for each db etc.

Yes, local tablet data. For each row in that tablet, the index-tablet may be in another server.

FranckPachot · June 29, 2024, 8:39pm

The concept of a distributed SQL database, as opposed to sharding on top of databases, is that the distribution is transparent to the application. This means that secondary indexes are global and distributed on their own key to serve a variety of use cases. If you are experiencing poor response time due to fetching rows from the index and tables from different nodes, there are several ways to address this:

YugabyteDB batches the fetch of index entries and table rows, so cross-node latency occurs only once every thousand rows.
Adding more columns to the index can eliminate the need to access the table at all (Index Only Scan).
Setting a preferred zone for tablet leaders ensures that tables and indexes are located close to where you are connected.

Topic		Replies	Views
The data source for backfill General	9	38	July 11, 2024
Tablets are not splitted evenly General	35	793	April 19, 2024
Put some data into the same tablet in a specified manner General	15	583	November 28, 2023
The relationship between parent tablets and child tablets? General	6	408	September 25, 2023
[RFC] Tablet splitting design Design Discussions	2	1956	September 7, 2021

Will a data table and its index table split simultaneously

Related topics