Tserver docker CPU load in idle

Each tserver docker node has 25% cpu load in idle.

Hi @victor,

Interesting. Wondering if there is some background task going on. Could you please answer a few questions about your setup, would help us identify what is going on:

  • How many vCPUs on each yb-tserver pod?
  • How many tables/tablets do you have in the system? You can find this from the master ui: http://yb-master-0:7000 (tablet-servers page will give you how many tablets on each yb-tserver).
  • Did you just finish loading data into any of the tables?
  • Could you also look at any of the tserver logs to see if anything is reported there?

I use docker-compose to setup cluster in dev environment (win10 + docker) - docker has 8 cpus for all services. 6 empty tables (created on db updater start). Yes. Logs in attach for tserver1.

Logs:

W0209 18:34:10.465245 19 reactor.cc:380] TabletServer_R007: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 6 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:12.184116 13 reactor.cc:380] TabletServer_R001: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 7 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:12.184180 15 reactor.cc:380] TabletServer_R003: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 2 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:12.198673 17 reactor.cc:380] TabletServer_R005: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 4 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:12.198679 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 0 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:12.244817 141 replica_state.cc:733] T e5645f251c394ca282b38125e1f10386 P 36f9a55694454f28941011b16cffc067 [term 3 LEADER]: Can’t advance the committed index across term boundaries until operations from the current term are replicated. Last committed operation was: { term: 1 index: 1 }, New majority replicated is: term: 1 index: 1, Current term is: 3
W0209 18:34:13.625566 16 reactor.cc:380] TabletServer_R004: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 2 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:13.625627 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 1 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:23.969054 18 reactor.cc:380] TabletServer_R006: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 4 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:23.969092 13 reactor.cc:380] TabletServer_R001: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 0 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:23.983862 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 1 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:23.983912 13 reactor.cc:380] TabletServer_R001: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 7 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:24.004087 141 replica_state.cc:733] T 435f874f25924e2e9055c968cac30dba P 36f9a55694454f28941011b16cffc067 [term 2 LEADER]: Can’t advance the committed index across term boundaries until operations from the current term are replicated. Last committed operation was: { term: 1 index: 1 }, New majority replicated is: term: 1 index: 1, Current term is: 2
W0209 18:34:25.972113 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 1 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:25.972182 15 reactor.cc:380] TabletServer_R003: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 1 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:25.986752 15 reactor.cc:380] TabletServer_R003: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 2 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:25.986797 19 reactor.cc:380] TabletServer_R007: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 5 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:26.008317 140 replica_state.cc:733] T 7dc71901204a459585ce9828ba10c0c3 P 36f9a55694454f28941011b16cffc067 [term 2 LEADER]: Can’t advance the committed index across term boundaries until operations from the current term are replicated. Last committed operation was: { term: 1 index: 1 }, New majority replicated is: term: 1 index: 1, Current term is: 2
W0209 18:34:30.978746 13 reactor.cc:380] TabletServer_R001: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 7 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:30.978780 12 reactor.cc:380] TabletServer_R000: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 7 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:30.992595 17 reactor.cc:380] TabletServer_R005: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 3 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:30.992645 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 1 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:31.010702 140 replica_state.cc:733] T 00f717bfad7847e7838bc59769cb0601 P 36f9a55694454f28941011b16cffc067 [term 2 LEADER]: Can’t advance the committed index across term boundaries until operations from the current term are replicated. Last committed operation was: { term: 1 index: 1 }, New majority replicated is: term: 1 index: 1, Current term is: 2
W0209 18:34:31.980079 12 reactor.cc:380] TabletServer_R000: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 6 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:31.980187 19 reactor.cc:380] TabletServer_R007: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 6 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:31.994076 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 0 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:31.994104 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 1 protocol: 0x00007f927e0a8f00 -> tcp }
W0209 18:34:32.012917 140 replica_state.cc:733] T 08584550feec4f70b729ebf72d0aca09 P 36f9a55694454f28941011b16cffc067 [term 2 LEADER]: Can’t advance the committed index across term boundaries until operations from the current term are replicated. Last committed operation was: { term: 1 index: 1 }, New majority replicated is: term: 1 index: 1, Current term is: 2

Something seems off here. cc @bogdan @sanketh - could you folks please take a look?

Hello Victor,

  1. We have updated our reference docker-compose file at https://docs.yugabyte.com/latest/deploy/docker/docker-compose/#3-test-the-apis recently. Please make sure your compose config looks similar to the reference.

  2. 25% usage on idle for 6 user tables does not match normal behavior. Were you running a benchmark or other workload prior to this? Could you roughly describe what the workload was doing at a high level? Or did you just create the 6 tables?

  3. Could you collect a performance profile from the tserver that would help us understand the idle usage? Here are the commands you would need to run in the yb-tserver container to collect the profile.

$ yum install -y perl graphviz

$ /home/yugabyte/bin/pprof --text http://localhost:9000/pprof/profile --seconds 30 > /mnt/tserver/tserver-profile.txt

$ /home/yugabyte/bin/pprof --ps http://localhost:9000/pprof/profile --seconds 30 > /mnt/tserver/tserver-profile.ps

Please then copy the two files /mnt/tserver/tserver-profile.txt /mnt/tserver/tserver-profile.ps out of the container and attach them to this thread.

Thanks for helping us debug this issue!

Hi.

  1. See attach.
  2. After start – db updater just created empty tables, no api activities, no inserts/selects, etc. I have api1x3 instances connected to each of 6 tservers and api2x3 instances connected to each of 6 tservers – so, 6 clients connected to each tserver.
  3. See attach.

Thanks.

Regards,

Victor

(Attachment docker-compose.yml is missing)

(Attachment tserver-profile.ps is missing)

(Attachment tserver-profile.txt is missing)

@victor you attachments are missing. Can you try again ?

Hi.

Use this link to download reports: Perf_report

P.S. I can attach only images to this thread.

The same issue on Ubuntu 18.04 VM:

@victor

Can you please provide a working docker-compose file ?
Example: it misses version on top, misses services, has undefined network, etc.

I tried with yb-docker-ctl multiple replicas and nodes and it worked fine:


https://docs.yugabyte.com/latest/admin/yb-docker-ctl/

Can you send the full docker-compose.yaml and try the guide above ?