Hi,
I was running yugabyte 2.14.3.1 on kubernetes using the yugabyte helm chart in a testing environment.
By accident multiple K8 Nodes were shutdown simultaneously which resulted in a forceful shutdown of all yugabyte pods.
The StatefulSet for the masters recovered quickly.
The pods of the tablet servers took a long time to recover (roughly 11 hours). Not much data was stored in yugabyte (I can’t say exactly, but should be less than 1 GB).
The tablet server pods shutdown and restarted many times, because of a CRC mismatch in the WAL.
I was wondering if there is a way to recover more quickly in such a scenario.
For example, would it be possible for the tablet-server to not shutdown on a CRC mismatch in the WAL?
Could the tablet-server immediately initiate the bootstrap of the tablet instead?
I also looked at this guideline here: Recover YB-TServer from crash loop | YugabyteDB Docs
but I am not sure if it can help in this scenario, because all tablet-servers were constantly restarting and not just one.
In the logs of the tablet servers I saw that a CRC mismatch was detected in the WAL, which resulted in the shutdown and restart of the tablet servers:
F0217 07:22:31.294332 119 consensus_queue.cc:694] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1 [LEADER]: Error reading the log while preparing peer request: Corruption (yb/consensus/log_util.cc:808): Failed to read ops 16..16: Failed to read LogEntry for index 16 from log segment 7 offset 168: Invalid checksum in log entry head header: found=3071432632, computed=356876958. Destination peer: 8ce7b8036f2c4fd698a101daaf45229d
After a restart of the tablet-server the affected tablet could be successfully bootstrapped:
I0217 07:23:38.081457 63 log_util.cc:615] Ignoring partially flushed segment in write ahead log /mnt/disk1/yb-data/tserver/wals/table-6e839e00f08242eab15ea679d0b88a29/tablet-76e3528156c3461ab36650a2b59d2f4f/wal-000000005 because there are no log entries following this one. The server probably crashed in the middle of writing an entry to the write-ahead log or downloaded an active log via remote bootstrap. Error detail: Corruption (yb/consensus/log_util.cc:808): Invalid checksum in log entry head header: found=2790768357, computed=933096183: Failed trying to read batch #3 at offset 320 for log segment /mnt/disk1/yb-data/tserver/wals/table-6e839e00f08242eab15ea679d0b88a29/tablet-76e3528156c3461ab36650a2b59d2f4f/wal-000000005: Prior batch offsets: 168 292 320; Last log entries read: [REPLICATE (8.12)] [REPLICATE (9.13)] [REPLICATE (10.14)]
I0217 07:25:47.156235 53 ts_tablet_manager.cc:1399] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Bootstrapping tablet
I0217 07:25:47.169693 53 version_set.cc:2852] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1 [R]: Recovered from manifest file:/mnt/disk1/yb-data/tserver/data/rocksdb/table-6e839e00f08242eab15ea679d0b88a29/tablet-76e3528156c3461ab36650a2b59d2f4f/MANIFEST-000036 succeeded,manifest_file_number is 36, next_file_number is 38, last_sequence is 1125899906842624, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is 0x0000000007dbe540 -> { op_id: 3.7 hybrid_time: { physical: 1675070444910877 logical: 1 } history_cutoff: <invalid> hybrid_time_filter: <invalid> max_value_level_ttl_expiration_time: <invalid> primary_schema_version: <NULL> cotable_schema_versions: [] }
I0217 07:25:47.196981 53 log_util.cc:692] Scanning /mnt/disk1/yb-data/tserver/wals/table-6e839e00f08242eab15ea679d0b88a29/tablet-76e3528156c3461ab36650a2b59d2f4f/wal-000000007 for valid entry headers following offset 256...
I0217 07:25:47.197877 53 log_util.cc:615] Ignoring partially flushed segment in write ahead log /mnt/disk1/yb-data/tserver/wals/table-6e839e00f08242eab15ea679d0b88a29/tablet-76e3528156c3461ab36650a2b59d2f4f/wal-000000006 because there are no log entries following this one. The server probably crashed in the middle of writing an entry to the write-ahead log or downloaded an active log via remote bootstrap. Error detail: Corruption (yb/consensus/log_util.cc:808): Invalid checksum in log entry head header: found=2356042956, computed=1918334030: Failed trying to read batch #3 at offset 256 for log segment /mnt/disk1/yb-data/tserver/wals/table-6e839e00f08242eab15ea679d0b88a29/tablet-76e3528156c3461ab36650a2b59d2f4f/wal-000000006: Prior batch offsets: 168 228 256; Last log entries read: [REPLICATE (11.15)]
I0217 07:25:47.197899 53 log.cc:616] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Opened existing logs. Last segment is /mnt/disk1/yb-data/tserver/wals/table-6e839e00f08242eab15ea679d0b88a29/tablet-76e3528156c3461ab36650a2b59d2f4f/wal-000000008
I0217 07:25:47.197909 53 tablet_bootstrap.cc:383] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Bootstrap optimizer (skip_flushed_entries): op_id_replay_lowest=3.7 (regular_op_id=3.7, intents_op_id=-1.-1, has_intents_db=0)
W0217 07:25:47.197918 53 tablet_bootstrap.cc:1166] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Bootstrap optimizer (skip_flushed_entries): Could not read the first entry's metadata of log segment /mnt/disk1/yb-data/tserver/wals/table-6e839e00f08242eab15ea679d0b88a29/tablet-76e3528156c3461ab36650a2b59d2f4f/wal-000000008. Simply continuing to earlier segments to determine the first segment to start the replay at. The error was: Not found (yb/consensus/log_util.cc:673): No entries found
I0217 07:25:47.197939 53 tablet_bootstrap.cc:1209] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Segment /mnt/disk1/yb-data/tserver/wals/table-6e839e00f08242eab15ea679d0b88a29/tablet-76e3528156c3461ab36650a2b59d2f4f/wal-000000007 cannot be used as the first segment to start replay with according to our OpId and retention criteria. Continuing to earlier segments.op_id=12.16, op_id_replay_lowest=3.7, first_op_time=8320133443.780s, min_seconds_to_retain_logs=660.000s, replay_from_this_or_earlier_time_was_initialized=0, *replay_from_this_or_earlier_time=8320132783.780s, is_first_op_id_low_enough=0, is_first_op_time_early_enough=0
I0217 07:25:47.198016 53 tablet_bootstrap.cc:1209] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Segment /mnt/disk1/yb-data/tserver/wals/table-6e839e00f08242eab15ea679d0b88a29/tablet-76e3528156c3461ab36650a2b59d2f4f/wal-000000006 cannot be used as the first segment to start replay with according to our OpId and retention criteria. Continuing to earlier segments.op_id=11.15, op_id_replay_lowest=3.7, first_op_time=8320133440.716s, min_seconds_to_retain_logs=660.000s, replay_from_this_or_earlier_time_was_initialized=1, *replay_from_this_or_earlier_time=8320132783.780s, is_first_op_id_low_enough=0, is_first_op_time_early_enough=0
I0217 07:25:47.198062 53 tablet_bootstrap.cc:1209] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Segment /mnt/disk1/yb-data/tserver/wals/table-6e839e00f08242eab15ea679d0b88a29/tablet-76e3528156c3461ab36650a2b59d2f4f/wal-000000005 cannot be used as the first segment to start replay with according to our OpId and retention criteria. However, this is already the earliest segment so we have to start replay here. We should probably investigate how we got into this situation. op_id=8.12, op_id_replay_lowest=3.7, first_op_time=8320133437.768s, min_seconds_to_retain_logs=660.000s, replay_from_this_or_earlier_time_was_initialized=1, *replay_from_this_or_earlier_time=8320132783.780s, is_first_op_id_low_enough=0, is_first_op_time_early_enough=0
I0217 07:25:47.198083 53 tablet_bootstrap.cc:1222] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Bootstrap optimizer (skip_flushed_entries): will replay all segments starting from the very first one.
I0217 07:25:47.200260 53 tablet_bootstrap_if.cc:96] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Bootstrap replayed 1/4 log segments. Read operations: 3, overwritten operations: 0. Pending: 1 replicates. Last read committed op id: 10.14, last entry metadata: { entry_time: 8320133437.768s offset: 168 active_segment_sequence_number: 5 }, last read entry op id: 10.14
I0217 07:25:47.200313 53 tablet_bootstrap_if.cc:96] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Bootstrap replayed 2/4 log segments. Read operations: 4, overwritten operations: 0. Pending: 1 replicates. Last read committed op id: 11.15, last entry metadata: { entry_time: 8320133440.716s offset: 168 active_segment_sequence_number: 6 }, last read entry op id: 11.15
I0217 07:25:47.200338 53 tablet_bootstrap_if.cc:96] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Bootstrap replayed 3/4 log segments. Read operations: 5, overwritten operations: 0. Pending: 1 replicates. Last read committed op id: 12.16, last entry metadata: { entry_time: 8320133443.780s offset: 168 active_segment_sequence_number: 7 }, last read entry op id: 12.16
I0217 07:25:47.200347 53 tablet_bootstrap_if.cc:96] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Bootstrap replayed 4/4 log segments. Read operations: 5, overwritten operations: 0. Pending: 1 replicates. Last read committed op id: 0.0, no entries in last segment
I0217 07:25:47.200354 53 tablet_bootstrap.cc:1377] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Dumping replay state to log at the end of PlaySegments
I0217 07:25:47.200366 53 tablet_bootstrap.cc:1074] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: ReplayState: Previous OpId: 12.16, Committed OpId: 12.16, Pending Replicates: 0, Flushed Regular: 3.7, Flushed Intents: -1.-1
I0217 07:25:47.200382 53 tablet_bootstrap.cc:1074] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Log entries applied to RocksDB: 5
I0217 07:25:47.200394 53 tablet_bootstrap.cc:1388] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Number of orphaned replicates: 0, last id: 12.16, committed id: 12.
I0217 07:25:47.211081 53 tablet_bootstrap_if.cc:96] T 76e3528156c3461ab36650a2b59d2f4f P 7b31bf647fbf40a88a0897948ae224d1: Bootstrap complete.