HeartBeat timeout happening frequently thorugh network seems ok

dipanjanghos · April 15, 2024, 1:44pm

Facing frequent heartbeat failed in the cluster.
We have two different clusters each having 3 master and 12 tservers.
Network team is did Pcap analysis once and found no such issues. iperf test also shows 24GiBps within the DB segment.
Find the error below:

W0415 09:57:43.058581 542877 transaction.cc:1913] 00000000-0000-0000-0000-000000000000: Send heartbeat failed: Timed out (yb/rpc/rpc.cc:220): UpdateTransaction: tablet_id: “b5614d77e5804bb5a4bbc230598d37f9” state { transaction_id: “\312^\004\350\034WK\365\240T\370\263s\365.\n” status: CREATED host_node_uuid: “16d3fdeb443340de995ff8720001788b” } propagated_hybrid_time: 7017165037487149058, retrier: { task_id: -1 state: kRunning deadline: 357423.825s } passed its deadline 357423.825s (passed: 5.080s): Illegal state (yb/consensus/consensus.cc:161): Not the leader (tablet server error 15), txn state: kRunning

Please let us know the area(s) we can check further.

Thanks ,
Dipanjan

dorian_yugabyte · April 16, 2024, 7:26am

Hi @dipanjanghos

What version are you using?

Is there xCluster replication between them? Or just 2 separate clusters?

Can you upload a bigger log sample zipped (in filesharing site) and set the link here?

Topic		Replies	Views
Connection failure: Not the leader General	2	899	July 23, 2021
ERROR: GetTransactionStatus RPC General	3	345	April 9, 2024
RPC Timeout in Production General	1	299	April 2, 2024
Failed to trigger leader election: Illegal state General	24	972	January 25, 2024
Batch Transaction in Yugabyte General	1	820	August 18, 2022

HeartBeat timeout happening frequently thorugh network seems ok

Related topics