Hi.
Excuse me, when creating an index for one field in a data table, does a node of the cluster only fill in the index for local tablet data of the data table? Or rather, the data filling of the index is cross node?
When the data of this data table continues to increase and needs to be split into multiple tablets, will the index table split along with it?
Does the sharded data of the split index correspond to the sharded data of the data table’s tablet?
Hi @ZhenNan2016
The table & index locations for a given row can be on different nodes of the cluster (except colocated db where they’re together).
No. The tables/indexes are split into “tablets”. The tablets split on their own.
No. Think of an index as a separate table that gets inserts/deletes from the main table. The splitting logic is separate.
Hi @dorian_yugabyte
I have a requirement now that I can build my own index data on the main table data of this node, without crossing nodes. This way, when searching for data, it can be locked on the same node, reducing the performance overhead caused by cross location. For example, if the tablet data of the main table on the node A is 1-100 rows of data, then the index data on the node A also belongs to the index data of these 1-100 rows.
Is there any good way to support this requirement?
@dorian_yugabyte
From the following online-index-backfill.md document, it appears that a node of the cluster only fill in the index for local tablet data of the data table?
This is not possible, see Co-partitioned tables · Issue #79 · yugabyte/yugabyte-db · GitHub
You can create multiple colocated dbs as an example, but that overheads that you can’t share connections, can’t join/transact data across dbs, have to handle DDL for each db etc.
Yes, local tablet data. For each row in that tablet, the index-tablet may be in another server.
The concept of a distributed SQL database, as opposed to sharding on top of databases, is that the distribution is transparent to the application. This means that secondary indexes are global and distributed on their own key to serve a variety of use cases. If you are experiencing poor response time due to fetching rows from the index and tables from different nodes, there are several ways to address this:
- YugabyteDB batches the fetch of index entries and table rows, so cross-node latency occurs only once every thousand rows.
- Adding more columns to the index can eliminate the need to access the table at all (Index Only Scan).
- Setting a preferred zone for tablet leaders ensures that tables and indexes are located close to where you are connected.