A tserver node went down while processing a write heavy workload

I have a 3 node yugabyte cluster setup on AWS with the following characteristics:

  • 3 c4.xlarge nodes
  • 5 TB EBS volumes attached to each one of them
  • Replication factor is 3

While doing some write intensive performance testing, I noticed that one of the tserver nodes went down. The log files indicate the following:

F0313 23:58:30.174280 19958 ref_cnt_buffer.cc:30] Check failed: data_ != nullptr 
Fatal failure details written to /home/ec2-user/yugabyte-db/data/disk0/yb-data/tserver/logs/yb-tserver.FATAL.details.2019-03-13T23_58_30.pid19726.txt
F20190313 23:58:30 ../../src/yb/util/ref_cnt_buffer.cc:30] Check failed: data_ != nullptr 
    @     0x7fb8085c7db3  yb::LogFatalHandlerSink::send(int, char const*, char const*, int, tm const*, char const*, unsigned long) (src/yb/util/logging.cc:474)
    @     0x7fb805cf1c05 
    @     0x7fb805cef439 
    @     0x7fb805cf22ae 
    @     0x7fb80862089b  yb::RefCntBuffer::RefCntBuffer(unsigned long) (src/yb/util/ref_cnt_buffer.cc:30)
    @     0x7fb80a601f33  yb::rpc::serialization::SerializeHeader(google::protobuf::MessageLite const&, unsigned long, yb::RefCntBuffer*, unsigned long, unsigned long*) (src/yb/rpc/serialization.cc:116)
    @     0x7fb80a5cd566  yb::rpc::OutboundCall::SetRequestParam(google::protobuf::Message const&) (src/yb/rpc/outbound_call.cc:244)
    @     0x7fb80a5d697b  yb::rpc::Proxy::AsyncRequest(yb::rpc::RemoteMethod const*, google::protobuf::Message const&, google::protobuf::Message*, yb::rpc::RpcController*, std::function<void ()>) (src/yb/rpc/proxy.cc:125)
    @     0x7fb80cb66b60  yb::consensus::ConsensusServiceProxy::UpdateConsensusAsync(yb::consensus::ConsensusRequestPB const&, yb::consensus::ConsensusResponsePB*, yb::rpc::RpcController*, std::function<void ()>) (src/yb/consensus/consensus.proxy.cc:32)
    @     0x7fb80ef07ae6  yb::consensus::RpcPeerProxy::UpdateAsync(yb::consensus::ConsensusRequestPB const*, yb::consensus::RequestTriggerMode, yb::consensus::ConsensusResponsePB*, yb::rpc::RpcController*, std::function<void ()> const&) (src/yb/consensus/consensus_peers.cc:491)
    @     0x7fb80ef0b6b4  yb::consensus::Peer::SendNextRequest(yb::consensus::RequestTriggerMode) (src/yb/consensus/consensus_peers.cc:300)
    @     0x7fb808639763  yb::ThreadPool::DispatchThread(bool) (src/yb/util/threadpool.cc:608)
    @     0x7fb808635405  std::function<void ()>::operator()() const (gcc/5.5.0_4/include/c++/5.5.0/functional:2267)
    @     0x7fb808635405  yb::Thread::SuperviseThread(void*) (src/yb/util/thread.cc:603)
    @     0x7fb804652693  start_thread (/tmp/glibc-20181130-26094-cs1x60/glibc-2.23/nptl/pthread_create.c:333)
    @     0x7fb803d8f41c  (unknown) (sysdeps/unix/sysv/linux/x86_64/clone.S:109)
    @ 0xffffffffffffffff 

*** Check failure stack trace: ***
    @     0x7fb8085c6b3b  DumpStackTraceAndExit (src/yb/util/logging.cc:166)
    @     0x7fb805cef8dc 
    @     0x7fb805cf17ec 
    @     0x7fb805cef439 
    @     0x7fb805cf22ae 
    @     0x7fb80862089b  yb::RefCntBuffer::RefCntBuffer(unsigned long) (src/yb/util/ref_cnt_buffer.cc:30)
    @     0x7fb80a601f33  yb::rpc::serialization::SerializeHeader(google::protobuf::MessageLite const&, unsigned long, yb::RefCntBuffer*, unsigned long, unsigned long*) (src/yb/rpc/serialization.cc:116)
    @     0x7fb80a5cd566  yb::rpc::OutboundCall::SetRequestParam(google::protobuf::Message const&) (src/yb/rpc/outbound_call.cc:244)
    @     0x7fb80a5d697b  yb::rpc::Proxy::AsyncRequest(yb::rpc::RemoteMethod const*, google::protobuf::Message const&, google::protobuf::Message*, yb::rpc::RpcController*, std::function<void ()>) (src/yb/rpc/proxy.cc:125)
    @     0x7fb80cb66b60  yb::consensus::ConsensusServiceProxy::UpdateConsensusAsync(yb::consensus::ConsensusRequestPB const&, yb::consensus::ConsensusResponsePB*, yb::rpc::RpcController*, std::function<void ()>) (src/yb/consensus/consensus.proxy.cc:32)
    @     0x7fb80ef07ae6  yb::consensus::RpcPeerProxy::UpdateAsync(yb::consensus::ConsensusRequestPB const*, yb::consensus::RequestTriggerMode, yb::consensus::ConsensusResponsePB*, yb::rpc::RpcController*, std::function<void ()> const&) (src/yb/consensus/consensus_peers.cc:491)
    @     0x7fb80ef0b6b4  yb::consensus::Peer::SendNextRequest(yb::consensus::RequestTriggerMode) (src/yb/consensus/consensus_peers.cc:300)
    @     0x7fb808639763  yb::ThreadPool::DispatchThread(bool) (src/yb/util/threadpool.cc:608)
    @     0x7fb808635405  std::function<void ()>::operator()() const (gcc/5.5.0_4/include/c++/5.5.0/functional:2267)
    @     0x7fb808635405  yb::Thread::SuperviseThread(void*) (src/yb/util/thread.cc:603)
    @     0x7fb804652693  start_thread (/tmp/glibc-20181130-26094-cs1x60/glibc-2.23/nptl/pthread_create.c:333)
    @     0x7fb803d8f41c  (unknown) (sysdeps/unix/sysv/linux/x86_64/clone.S:109)
    @ 0xffffffffffffffff 
(END) 

They refer to a file called yb-tserver.FATAL.details.2019-03-13T23_58_30.pid19726.txt, the contents of which are:

F20190313 23:58:30 ../../src/yb/util/ref_cnt_buffer.cc:30] Check failed: data_ != nullptr
    @     0x7fb8085c7db3  yb::LogFatalHandlerSink::send(int, char const*, char const*, int, tm const*, char const*, unsigned long) (src/yb/util/logging.cc:474)
    @     0x7fb805cf1c05
    @     0x7fb805cef439
    @     0x7fb805cf22ae
    @     0x7fb80862089b  yb::RefCntBuffer::RefCntBuffer(unsigned long) (src/yb/util/ref_cnt_buffer.cc:30)
    @     0x7fb80a601f33  yb::rpc::serialization::SerializeHeader(google::protobuf::MessageLite const&, unsigned long, yb::RefCntBuffer*, unsigned long, unsigned long*) (src/yb/rpc/serialization.cc:116)
    @     0x7fb80a5cd566  yb::rpc::OutboundCall::SetRequestParam(google::protobuf::Message const&) (src/yb/rpc/outbound_call.cc:244)
    @     0x7fb80a5d697b  yb::rpc::Proxy::AsyncRequest(yb::rpc::RemoteMethod const*, google::protobuf::Message const&, google::protobuf::Message*, yb::rpc::RpcController*, std::function<void ()>) (src/yb/rpc/pro
    @     0x7fb80cb66b60  yb::consensus::ConsensusServiceProxy::UpdateConsensusAsync(yb::consensus::ConsensusRequestPB const&, yb::consensus::ConsensusResponsePB*, yb::rpc::RpcController*, std::function<void ()>
    @     0x7fb80ef07ae6  yb::consensus::RpcPeerProxy::UpdateAsync(yb::consensus::ConsensusRequestPB const*, yb::consensus::RequestTriggerMode, yb::consensus::ConsensusResponsePB*, yb::rpc::RpcController*, std::
    @     0x7fb80ef0b6b4  yb::consensus::Peer::SendNextRequest(yb::consensus::RequestTriggerMode) (src/yb/consensus/consensus_peers.cc:300)
    @     0x7fb808639763  yb::ThreadPool::DispatchThread(bool) (src/yb/util/threadpool.cc:608)
    @     0x7fb808635405  std::function<void ()>::operator()() const (gcc/5.5.0_4/include/c++/5.5.0/functional:2267)
    @     0x7fb808635405  yb::Thread::SuperviseThread(void*) (src/yb/util/thread.cc:603)
    @     0x7fb804652693  start_thread (/tmp/glibc-20181130-26094-cs1x60/glibc-2.23/nptl/pthread_create.c:333)
    @     0x7fb803d8f41c  (unknown) (sysdeps/unix/sysv/linux/x86_64/clone.S:109)
    @ 0xffffffffffffffff 

Not sure if there was any data loss yet (I hope not!)

Hi @Jigar_Patel

Haven’t looked into this error in detail yet, but I wanted to check one thing. The Terraform template uses the root disk “/” to store data. Can you post the output of the following command on the failing tserver?

$ df -hs

If you see this code snippet:
https://github.com/YugaByte/terraform-aws-yugabyte/blob/master/scripts/install_software.sh#L67
The fs_data_dirs directory is set to ${YB_HOME}/data/disk0 and ${YB_HOME}/data/disk1.

--fs_data_dirs=${YB_HOME}/data/disk0,${YB_HOME}/data/disk1

Could you please set this to the actual mounted drives, or sym-link this directory to the mounted drives?

You can do the following to check that it points to the correct data directory:

$ df -h ${YB_HOME}/data/disk0
$ df -h ${YB_HOME}/data/disk1

@Jigar_Patel:

a) In the script, did you change the root_volume_size from 50 (G) to a larger number? Or was that a separately mounted drive?

b) From the stack, it could be an OOM (you can check /var/log/messages). For a very write heavy workload, the CPU/RAM/disk configuration and maybe a few knobs need to be looked at. In particular, a 4-cpu/7.5GB RAM/5TB storage might be too tight for compactions/flushing to be able to keep up pace both from CPU perspective and from using a gp2 volume perspective.

Do you have a feel for if CPU was the bottleneck or disk IO was when your workload was running?

Happy to chat over slack when you are available as well.

@karthik @kannan

df -h /home/ec2-user/yugabyte-db/data/disk0 and df -h /home/ec2-user/yugabyte-db/data/disk1 both return the following:

Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      4.9T   91G  4.8T   2% /

So it looks like fs_data_dirs is pointing to the right place. The terraform module attaches a 50GB EBS volume by default. Once the initial setup was done, I shut down the cluster, resized the EBS volumes to 5000GB and then started the cluster again. I’ve been keeping an eye on the disk space and at no point was it exhausted.

Unfortunately I can’t check /var/log/messages anymore since I destroyed and re-created the stack. With my write heavy workload on the cluster, I’ve consistently noticed that the CPU becomes a bottleneck as soon as I thrash the cluster with lots and lots of writes.

Do I need to adjust my write workload to keep the CPU usage at around 75% or so on each node, to leave 1 core (25%) to do maintenance tasks?

1 Like

Oh also, there is barely any IO wait. With a 5TB GP2 volume, I get 15000 IOPS and monitoring those volumes from the AWS console indicates that IO should not be the problem.

@Jigar_Patel-

It could be that for your workload, you need to run on a 8-cpu (or more machine).

Depends if the load phase is one time to migrate old data - in which case you could stay on the 4-core machine, but load more gracefully; or if it is going to be your typical/steady workload. But if this is the steady pattern, then you might need to be on a bigger CPU machine.

There are two considerations worth thinking about:

  1. Some spare CPU for compactions/flushes.

  2. Have some spare capacity to handle node failures. If 1 node in a 10 node cluster fails, impact on remaining 9 nodes isn’t too much. But if 1 node in a 3 or 4 node cluster fails the remaining nodes have to take on a lot more load.

regards,
Kannan

Thanks @kannan.
Our typical workload will be write intensive in bursts.

However, my current workload is write intensive and steady/sustained because I am trying to load as much data as possible, as quickly as possible, to better understand how YB works. I am trying to test the read performance for specific queries once a lot of data has been inserted, for example.

Thanks for your input!

Sounds good. Keep us posted; one of us would be happy to be on a conf-call when the system is running if you have any perf concerns with the setup and would like us to look further.

1 Like