HeartBeat timeout happening frequently thorugh network seems ok

Facing frequent heartbeat failed in the cluster.
We have two different clusters each having 3 master and 12 tservers.
Network team is did Pcap analysis once and found no such issues. iperf test also shows 24GiBps within the DB segment.
Find the error below:

W0415 09:57:43.058581 542877 transaction.cc:1913] 00000000-0000-0000-0000-000000000000: Send heartbeat failed: Timed out (yb/rpc/rpc.cc:220): UpdateTransaction: tablet_id: “b5614d77e5804bb5a4bbc230598d37f9” state { transaction_id: “\312^\004\350\034WK\365\240T\370\263s\365.\n” status: CREATED host_node_uuid: “16d3fdeb443340de995ff8720001788b” } propagated_hybrid_time: 7017165037487149058, retrier: { task_id: -1 state: kRunning deadline: 357423.825s } passed its deadline 357423.825s (passed: 5.080s): Illegal state (yb/consensus/consensus.cc:161): Not the leader (tablet server error 15), txn state: kRunning

Please let us know the area(s) we can check further.

Thanks ,
Dipanjan

Hi @dipanjanghos

What version are you using?

Is there xCluster replication between them? Or just 2 separate clusters?

Can you upload a bigger log sample zipped (in filesharing site) and set the link here?