Yb-tserver unexpectedly stopped with code 1 and no msgs in logs

I’m using yugabyte in k8s on a bare-metal on-premis installation and encountered a problem with yb-tserver suddenly crashing, the process of which ends with exit code 1 with no ERROR or FATAL message on the logs.

I’m using three nodes AMD Ryzen 9 7950X: 16 × 4.5ГГц, 128 ГБ DDR5, 2 × 2 ТБ SSD NVMe M.2 dedicated for yugabyte cluster, three yb-masters and three yb-tservers are deployed on the nodes: one yb-master and one yb-tserver on each node. It’s all ok with yb-master’s pods, but yb-tserver have a high restart counter with Reason: Error - exit code: 1
There is no new messages on ERROR on FATAL logs (fatal logs does not exists at all) before or after restart. So I do not know where to start to dig.

Last message in the container log

2024-02-26 09:47:59,247 [INFO] k8s_parent.py: core_pattern is: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
Traceback (most recent call last):
  File "/home/yugabyte/tools/k8s_parent.py", line 236, in <module>
    files_copied = copy_cores(cores_dir)
  File "/home/yugabyte/tools/k8s_parent.py", line 124, in copy_cores
    dir_path = get_core_dump_dir()
  File "/home/yugabyte/tools/k8s_parent.py", line 65, in get_core_dump_dir
    raise ValueError("core_pattern starts with |, can't do anything useful")
ValueError: core_pattern starts with |, can't do anything useful

By the way, there are a lot of warning messages before stop:

W0226 09:44:33.717324   135 long_operation_tracker.cc:125] Read running for 1.001s in thread 614:
    @     0x7fe94d4b6b39  __lll_lock_wait
    @     0x7fe94d4b16e2  __GI___pthread_mutex_lock
    @     0x5643bccc254a  rocksdb::(anonymous namespace)::ShardedLRUCache::Lookup()
    @     0x5643bcc95ea6  rocksdb::BlockBasedTable::GetDataBlockFromCache()
    @     0x5643bcc95317  rocksdb::BlockBasedTable::RetrieveBlock()
    @     0x5643bcc9802f  rocksdb::BlockBasedTable::NewDataBlockIterator()
    @     0x5643bccbeeda  rocksdb::(anonymous namespace)::TwoLevelIterator::InitDataBlock()
    @     0x5643bccbe6ef  rocksdb::(anonymous namespace)::TwoLevelIterator::Seek()
    @     0x5643bccacef8  rocksdb::MergingIterator::Seek()
    @     0x5643bcc04b6e  rocksdb::DBIter::Seek()
    @     0x5643bc166fba  yb::docdb::BoundedRocksDbIterator::Seek()
    @     0x5643bc22ef1c  yb::docdb::(anonymous namespace)::SeekPossiblyUsingNext()
    @     0x5643bc22f66f  yb::docdb::PerformRocksDBSeek()
    @     0x5643bc222103  yb::docdb::IntentAwareIterator::Seek()
    @     0x5643bc1da0d1  yb::docdb::DocRowwiseIterator::Seek()
    @     0x5643bc1e151b  yb::docdb::DocRowwiseIteratorBase::SeekTuple()

W0226 09:46:34.203969   135 long_operation_tracker.cc:125] Read running for 1.000s in thread 855:
    @     0x7fe94d79de62  __GI___pread
    @     0x5643bd500261  yb::PosixRandomAccessFile::Read()
    @     0x5643bd4ff1ba  yb::RandomAccessFile::ReadAndValidate()
    @     0x5643bcccd4f8  rocksdb::RandomAccessFileReader::ReadAndValidate()
    @     0x5643bcca075f  rocksdb::ReadBlockContents()
    @     0x5643bcc962db  rocksdb::block_based_table::ReadBlockFromFile()
    @     0x5643bcc95409  rocksdb::BlockBasedTable::RetrieveBlock()
    @     0x5643bcc9802f  rocksdb::BlockBasedTable::NewDataBlockIterator()
    @     0x5643bccbeeda  rocksdb::(anonymous namespace)::TwoLevelIterator::InitDataBlock()
    @     0x5643bccbe6ef  rocksdb::(anonymous namespace)::TwoLevelIterator::Seek()
    @     0x5643bccacef8  rocksdb::MergingIterator::Seek()
    @     0x5643bcc04b6e  rocksdb::DBIter::Seek()
    @     0x5643bc166fba  yb::docdb::BoundedRocksDbIterator::Seek()
    @     0x5643bc22ef1c  yb::docdb::(anonymous namespace)::SeekPossiblyUsingNext()
    @     0x5643bc22f66f  yb::docdb::PerformRocksDBSeek()
    @     0x5643bc222103  yb::docdb::IntentAwareIterator::Seek()

W0226 09:46:59.360953   135 long_operation_tracker.cc:125] Read running for 1.000s in thread 817:
    @     0x5643bcc7bb53  rocksdb::DecodeRestartEntry()
    @     0x5643bcc7b445  rocksdb::BlockIter::Seek()
    @     0x5643bccab3cb  rocksdb::IteratorWrapper::Seek()
    @     0x5643bccaadab  rocksdb::MultiLevelIterator::Seek()
    @     0x5643bccbe6e3  rocksdb::(anonymous namespace)::TwoLevelIterator::Seek()
    @     0x5643bccacef8  rocksdb::MergingIterator::Seek()
    @     0x5643bcc04b6e  rocksdb::DBIter::Seek()
    @     0x5643bc166fba  yb::docdb::BoundedRocksDbIterator::Seek()
    @     0x5643bc22ef1c  yb::docdb::(anonymous namespace)::SeekPossiblyUsingNext()
    @     0x5643bc22f66f  yb::docdb::PerformRocksDBSeek()
    @     0x5643bc222103  yb::docdb::IntentAwareIterator::Seek()
    @     0x5643bc1da0d1  yb::docdb::DocRowwiseIterator::Seek()
    @     0x5643bc1e151b  yb::docdb::DocRowwiseIteratorBase::SeekTuple()
    @     0x5643bc24e6cd  yb::docdb::(anonymous namespace)::FilteringIterator::FetchTuple()
    @     0x5643bc24aacd  yb::docdb::PgsqlReadOperation::Execute()
    @     0x5643bcecb0f5  yb::tablet::Tablet::HandlePgsqlReadRequest()

W0226 09:47:00.436651   135 long_operation_tracker.cc:125] Read running for 1.224s in thread 806:
Thread did not respond: maybe it is blocking signals
W0226 09:47:00.711908   869 long_operation_tracker.cc:155] Read took a long time: 1.487s

Yb-tserver uses sig-storage-local-static-provisioner as storageclass, so all data stored on the same server.

Yugabytedb was installed with yugabytedb Helm chart, current container version is yugabytedb/yugabyte:2.20.1.3-b3 (for master and tserver).

I thought that it could be wired with OOM kills because of resource limits. But processes stopped by OOM killer have exit code 137 but not 1. There are some questions about memory consumption exist, but this is a subject for a separate topic.

Please assist me how to debug this situation with unexpected “exit code: 1” of yb-tserver.

Hi @revoluterra

Is there a way to get a core-dump?

Is this about not being unable to generate a core dump?

What is the disk configuration (raid?)?

Can you do a quick benchmark or check on the disk(s) to see if they’re ok?

Hi, @dorian_yugabyte
Thanks for your reply! Sure, the last message in container log was wired with core dump. I have fixed core_pattern to get core dump on crash. What is the best way to share the core dump with you?

As I said early, my yb-tservers uses local static provisioner, so all date stored on the same host as yb-tserver instance is running (like hostPath). An underlying storage is ext4 fs on MDADM (linux software raid) RAID0 based on two same NVMe M2 drives (Samsung SSD 980 Pro). Each server has the same storage configuration. All drives are new, no errors on SMART are logged.

  read: IOPS=347k, BW=1356MiB/s (1422MB/s)(768MiB/566msec)
  write: IOPS=116k, BW=453MiB/s (475MB/s)(256MiB/566msec)

Well, I think the drives are ok, but could it be wired with mdadm raid0?

You can use these steps https://support.yugabyte.com/hc/en-us/articles/4403183233805-Generate-a-backtrace-from-a-core-file-with-GDB

Maybe. You can try with normal drivers to rule this out. Or you can benchmark the drives from within the containers with another tool?

First of all, seems that there is no core dump created on crash (may be it’s gracefully terminated but not crashed?) I have configured core_pattern, but I see, that there is no core files created. Here is a part of a log:

2024-03-04 09:02:39,207 [INFO] k8s_parent.py: core_pattern is: /mnt/disk1/core.%e.%p.%t 
2024-03-04 09:02:39,207 [INFO] k8s_parent.py: Copying latest 5 core files to '/mnt/disk0/cores' using glob '/mnt/disk1/*core*' 
2024-03-04 09:02:39,207 [INFO] k8s_parent.py: Copied 0 core files to '/mnt/disk0/cores'

But on the last time of process termination I got exit code 247, but not 1. Looks like it’s wired with insufficient memory. I have attached log from a container, may be it will be helpful.

Another reason to think about memory and some kind of OOM that gracefully terminates ybtserver as a root case of my problem is that ybtserver pod unexpectedly eats a lot of host memory. Here is the graph below.

Each host with yugabyteDB has 128GB mem (125GB available for pods payload). ybtserver started with an argument --memory_limit_hard_bytes=34359738368 There is no resource limits set for yugabyte containers. But sometimes yb-tserver pod consumes a lot of memory (much more then 32GB) - memory usage graph for yb-tserver-2 (instance with provided logs) attached below.

working_set memory includes files cache and seems that it’s not flushed.

So, here is my proposal:
If a coredump is not created, then the application terminates gracefully (but with an exit code other than 0). Assuming that this is due to a “memory leak”, figure out why ybtserver is suddenly consuming so much memory.
The first thing I don’t understand is why automatic flushing the file cache doesn’t work (if I do drop_caches, then the working set’s memory drops to RSS, and then starts to grow again).
The second thing I don’t understand is why memory_limit_hard_bytes doesn’t work.

About your request to benchmark disks from the pod or disable raid0.
Well, disabling raid0 is a more hard operation for me. I can make a disk benchmark from the pod. Before I did a test with the fio tool. What tool can you recommend to use from the pod? Or may fio also be used for the test?

Is that error happening on 1 server or all of them?

Are you still getting the original error?

W0226 09:44:33.717324 135 long_operation_tracker.cc:125] Read running for 1.001s in thread 614:

How are you running yb-master? I’m assuming on a separate container, correct? And with what configuration?

What kind of workload are you running on the cluster ?

No, that error appears on all three servers. Mostly on host with yb-tserver-0 pod. There is also exists messages like

W0305 09:57:18.652333 149 long_operation_tracker.cc:125] Log callback running for 1.006s in thread 28:
W0305 09:51:35.597337 161 long_operation_tracker.cc:125] UpdateReplica running for 1.018s in thread 118434:
W0305 09:51:29.031973 161 long_operation_tracker.cc:125] TSHeartbeat running for 2.040s in thread 4258:

yb-master runs on separate pods, as helm chart deploys them. But three yb-master pods runs on the same as yb-tserver hosts. So each of my host runs yb-master and yb-tserver.

It’s an OLTP workload. I have switched one of my online marketplace to use YugabyteDB. Most of all it is a SELECT statements.