Every Tservers are in different placement_zone like test1, test2…test5. While firing some test data in a table we observed that 1 Tserver does not have any read/write load, neither it has any tablet leader or follower.
Our table has a composite primary key like (column1 timesamp with time zone, column2 varchar, column3 varchar) and its partitioned on column1.
It will be great help if you could help me understand, why 1 Tserver doesnt have any load/tablets.
No index created yet. Only primary key. (column1 HASH, column2 ASC, column3 ASC)
Replication Factor 3.
I have restarted that tserver…after that logs are like below.
E0124 15:10:20.042120 3742427 tablet_metadata.cc:768] T 2d02f1fcde54450d9c2bf607c59b82d8 P 7e5008537af04c6ca948874750951e50: Failed to destr
oy regular DB at: /data1/ybtdata1/yb-data/tserver/data/rocksdb/table-00004001000030008000000000004033/tablet-2d02f1fcde54450d9c2bf607c59b82d
8: IO error (yb/rocksdb/util/env_posix.cc:315): /data1/ybtdata1/yb-data/tserver/data/rocksdb/table-00004001000030008000000000004033/tablet-2
d02f1fcde54450d9c2bf607c59b82d8/LOCK: No such file or directory
W0124 15:12:19.216068 3743161 transaction_manager.cc:71] No local transaction status tablet found
W0124 15:12:20.350165 3743163 transaction_manager.cc:71] No local transaction status tablet found
W0124 15:12:21.350219 3743377 transaction_manager.cc:71] No local transaction status tablet found
W0124 15:12:22.480863 3743374 transaction_manager.cc:71] No local transaction status tablet found
W0124 15:12:23.480964 3743378 transaction_manager.cc:71] No local transaction status tablet found
W0124 15:12:24.600047 3743373 transaction_manager.cc:71] No local transaction status tablet found
Hi Frank
I have stopped that Tserver and removed everything from data directory and then started the server. Tserver joined the cluster and data is getting loaded in that server, however i noticed few errors in logs. like below. And inserts are coming less, if we try to insert 10000 rows, we found only 8000 in that table.
I0125 00:02:31.659812 5356 poller.cc:66] Poll stopped: Service unavailable (yb/rpc/scheduler.cc:80): Scheduler is shutting down (system error 108)
I0125 09:43:23.202272 3770960 client_master_rpc.cc:77] 0x0000059cf90e8978 → GetTableSchemaRpc(table_identifier: table_id: “0000400100003000
8000000000004015”, num_attempts: 1): Failed, got resp error: Not found (yb/master/catalog_manager.cc:5422): Table with identifier 0000400100
0030008000000000004015 not found: OBJECT_NOT_FOUND (master error 3)
W0125 09:43:23.203146 3770960 client-internal.cc:1466] GetTableSchemaRpc(table_identifier: table_id: “00004001000030008000000000004015”, num
_attempts: 1) failed: Not found (yb/master/catalog_manager.cc:5422): Table with identifier 00004001000030008000000000004015 not found: OBJEC
T_NOT_FOUND (master error 3)
I0125 09:43:26.503696 3756447 log.cc:1433] T 376539761ff044bfb899904b6c026ce1 P 4e65a702b3dd4b3d97621f53b4d16adf: Running Log GC on /data1/y
btdata1/yb-data/tserver/wals/table-00004001000030008000000000004033/tablet-376539761ff044bfb899904b6c026ce1: retaining ops >= 22042286, log
segment size = 67108864
Thanks for suggestion about hash sharding, i rectified it. Now hash is happening on a varchar column. However that table is partitioned based on column with data type timestamp with time zone.
Will it work fine now?
Now hash is happening on a varchar column. However that table is partitioned based on column with data type timestamp with time zone. Will it work fine now?
The partition (partitioned table) will have a primary key that is similar to its top-level table. Thus, it will be sharded using the hash sharding techique.
I would advise you to watch this technical deep dive into range vs. hash sharding. Because the range sharding still might work better for your use case. But if you pick the range sharding, then you need to avoid potential hotspots and the video explains various techniques:
I have stopped that Tserver and removed everything from data directory and then started the server. Tserver joined the cluster and data is getting loaded in that server, however i noticed few errors in logs. like below. And inserts are coming less, if we try to insert 10000 rows, we found only 8000 in that table.
Judging by the logs, I would say that you might have encountered some problems during the cluster startup/configuration that led to the error. Try to kill all the nodes, recreate the directories that you use as their base dir and start the cluster from scratch. Please also share commands that you use to start the cluster.