1 Tserver does not have any load

Hi
Doing an POC on YSQL, where I have 3 masters and 5 Tservers, and cloud, region and zone information like below

–placement_cloud=test
–placement_region=test
–placement_zone=test1
–enable_automatic_tablet_splitting=true
–ysql_num_shards_per_tserver=1

Every Tservers are in different placement_zone like test1, test2…test5. While firing some test data in a table we observed that 1 Tserver does not have any read/write load, neither it has any tablet leader or follower.

Our table has a composite primary key like (column1 timesamp with time zone, column2 varchar, column3 varchar) and its partitioned on column1.

It will be great help if you could help me understand, why 1 Tserver doesnt have any load/tablets.

Hi,

Please provide the details below so that we can move forward in the right direction:

  • Do you create a hash or range index? Share the exact create index statement.
  • What’s your replication factor?
  • Share the logs from the suspicious tserver.

Hi Denis

  1. No index created yet. Only primary key. (column1 HASH, column2 ASC, column3 ASC)
  2. Replication Factor 3.
  3. I have restarted that tserver…after that logs are like below.

E0124 15:10:20.042120 3742427 tablet_metadata.cc:768] T 2d02f1fcde54450d9c2bf607c59b82d8 P 7e5008537af04c6ca948874750951e50: Failed to destr
oy regular DB at: /data1/ybtdata1/yb-data/tserver/data/rocksdb/table-00004001000030008000000000004033/tablet-2d02f1fcde54450d9c2bf607c59b82d
8: IO error (yb/rocksdb/util/env_posix.cc:315): /data1/ybtdata1/yb-data/tserver/data/rocksdb/table-00004001000030008000000000004033/tablet-2
d02f1fcde54450d9c2bf607c59b82d8/LOCK: No such file or directory

W0124 15:12:19.216068 3743161 transaction_manager.cc:71] No local transaction status tablet found
W0124 15:12:20.350165 3743163 transaction_manager.cc:71] No local transaction status tablet found
W0124 15:12:21.350219 3743377 transaction_manager.cc:71] No local transaction status tablet found
W0124 15:12:22.480863 3743374 transaction_manager.cc:71] No local transaction status tablet found
W0124 15:12:23.480964 3743378 transaction_manager.cc:71] No local transaction status tablet found
W0124 15:12:24.600047 3743373 transaction_manager.cc:71] No local transaction status tablet found

@subh14 you can check from the master UI (http://master:7000/tables) to be sure that you have 5 tablets, with a leader in each shard.

Maybe you test case inserts only few specific column1 for which the hash code has only few values that do not go to the range of one tablet.

Unrelated, but it is probably not a good idea to have hash sharding on a timestamp because there are good chances that you have range queries on it.

Hi Frank
I have stopped that Tserver and removed everything from data directory and then started the server. Tserver joined the cluster and data is getting loaded in that server, however i noticed few errors in logs. like below. And inserts are coming less, if we try to insert 10000 rows, we found only 8000 in that table.


I0125 00:02:31.659812 5356 poller.cc:66] Poll stopped: Service unavailable (yb/rpc/scheduler.cc:80): Scheduler is shutting down (system error 108)

I0125 09:43:23.202272 3770960 client_master_rpc.cc:77] 0x0000059cf90e8978 → GetTableSchemaRpc(table_identifier: table_id: “0000400100003000
8000000000004015”, num_attempts: 1): Failed, got resp error: Not found (yb/master/catalog_manager.cc:5422): Table with identifier 0000400100
0030008000000000004015 not found: OBJECT_NOT_FOUND (master error 3)
W0125 09:43:23.203146 3770960 client-internal.cc:1466] GetTableSchemaRpc(table_identifier: table_id: “00004001000030008000000000004015”, num
_attempts: 1) failed: Not found (yb/master/catalog_manager.cc:5422): Table with identifier 00004001000030008000000000004015 not found: OBJEC
T_NOT_FOUND (master error 3)
I0125 09:43:26.503696 3756447 log.cc:1433] T 376539761ff044bfb899904b6c026ce1 P 4e65a702b3dd4b3d97621f53b4d16adf: Running Log GC on /data1/y
btdata1/yb-data/tserver/wals/table-00004001000030008000000000004033/tablet-376539761ff044bfb899904b6c026ce1: retaining ops >= 22042286, log
segment size = 67108864


Thanks for suggestion about hash sharding, i rectified it. Now hash is happening on a varchar column. However that table is partitioned based on column with data type timestamp with time zone.
Will it work fine now?

Hey, thanks for sharing additional details!

Now hash is happening on a varchar column. However that table is partitioned based on column with data type timestamp with time zone. Will it work fine now?

The partition (partitioned table) will have a primary key that is similar to its top-level table. Thus, it will be sharded using the hash sharding techique.

I would advise you to watch this technical deep dive into range vs. hash sharding. Because the range sharding still might work better for your use case. But if you pick the range sharding, then you need to avoid potential hotspots and the video explains various techniques:

I have stopped that Tserver and removed everything from data directory and then started the server. Tserver joined the cluster and data is getting loaded in that server, however i noticed few errors in logs. like below. And inserts are coming less, if we try to insert 10000 rows, we found only 8000 in that table.

Judging by the logs, I would say that you might have encountered some problems during the cluster startup/configuration that led to the error. Try to kill all the nodes, recreate the directories that you use as their base dir and start the cluster from scratch. Please also share commands that you use to start the cluster.

Please don’t just (stop + delete data + start) a node again but follow the procedure for adding and removing nodes from a live cluster Change cluster configuration | YugabyteDB Docs.

Hi Dorian
Thanks for your input. I will definitely follow as instructed in the documentation before adding or removing any node.