Each tserver docker node has 25% cpu load in idle.
Hi @victor,
Interesting. Wondering if there is some background task going on. Could you please answer a few questions about your setup, would help us identify what is going on:
- How many vCPUs on each
yb-tserver
pod? - How many tables/tablets do you have in the system? You can find this from the master ui: http://yb-master-0:7000 (tablet-servers page will give you how many tablets on each
yb-tserver
). - Did you just finish loading data into any of the tables?
- Could you also look at any of the tserver logs to see if anything is reported there?
I use docker-compose to setup cluster in dev environment (win10 + docker) - docker has 8 cpus for all services. 6 empty tables (created on db updater start). Yes. Logs in attach for tserver1.
Logs:
β¦
W0209 18:34:10.465245 19 reactor.cc:380] TabletServer_R007: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 6 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:12.184116 13 reactor.cc:380] TabletServer_R001: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 7 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:12.184180 15 reactor.cc:380] TabletServer_R003: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 2 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:12.198673 17 reactor.cc:380] TabletServer_R005: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 4 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:12.198679 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 0 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:12.244817 141 replica_state.cc:733] T e5645f251c394ca282b38125e1f10386 P 36f9a55694454f28941011b16cffc067 [term 3 LEADER]: Canβt advance the committed index across term boundaries until operations from the current term are replicated. Last committed operation was: { term: 1 index: 1 }, New majority replicated is: term: 1 index: 1, Current term is: 3
W0209 18:34:13.625566 16 reactor.cc:380] TabletServer_R004: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 2 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:13.625627 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 1 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:23.969054 18 reactor.cc:380] TabletServer_R006: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 4 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:23.969092 13 reactor.cc:380] TabletServer_R001: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 0 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:23.983862 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 1 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:23.983912 13 reactor.cc:380] TabletServer_R001: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 7 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:24.004087 141 replica_state.cc:733] T 435f874f25924e2e9055c968cac30dba P 36f9a55694454f28941011b16cffc067 [term 2 LEADER]: Canβt advance the committed index across term boundaries until operations from the current term are replicated. Last committed operation was: { term: 1 index: 1 }, New majority replicated is: term: 1 index: 1, Current term is: 2
W0209 18:34:25.972113 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 1 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:25.972182 15 reactor.cc:380] TabletServer_R003: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 1 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:25.986752 15 reactor.cc:380] TabletServer_R003: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 2 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:25.986797 19 reactor.cc:380] TabletServer_R007: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 5 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:26.008317 140 replica_state.cc:733] T 7dc71901204a459585ce9828ba10c0c3 P 36f9a55694454f28941011b16cffc067 [term 2 LEADER]: Canβt advance the committed index across term boundaries until operations from the current term are replicated. Last committed operation was: { term: 1 index: 1 }, New majority replicated is: term: 1 index: 1, Current term is: 2
W0209 18:34:30.978746 13 reactor.cc:380] TabletServer_R001: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 7 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:30.978780 12 reactor.cc:380] TabletServer_R000: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 7 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:30.992595 17 reactor.cc:380] TabletServer_R005: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 3 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:30.992645 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 1 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:31.010702 140 replica_state.cc:733] T 00f717bfad7847e7838bc59769cb0601 P 36f9a55694454f28941011b16cffc067 [term 2 LEADER]: Canβt advance the committed index across term boundaries until operations from the current term are replicated. Last committed operation was: { term: 1 index: 1 }, New majority replicated is: term: 1 index: 1, Current term is: 2
W0209 18:34:31.980079 12 reactor.cc:380] TabletServer_R000: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 6 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:31.980187 19 reactor.cc:380] TabletServer_R007: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 6 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:31.994076 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.10:9100 idx: 0 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:31.994104 14 reactor.cc:380] TabletServer_R002: Client call yb.consensus.ConsensusService.RequestConsensusVote has no timeout set for connection id: { remote: 172.18.0.7:9100 idx: 1 protocol: 0x00007f927e0a8f00 β tcp }
W0209 18:34:32.012917 140 replica_state.cc:733] T 08584550feec4f70b729ebf72d0aca09 P 36f9a55694454f28941011b16cffc067 [term 2 LEADER]: Canβt advance the committed index across term boundaries until operations from the current term are replicated. Last committed operation was: { term: 1 index: 1 }, New majority replicated is: term: 1 index: 1, Current term is: 2
Hello Victor,
-
We have updated our reference docker-compose file at Deploy | YugabyteDB Docs recently. Please make sure your compose config looks similar to the reference.
-
25% usage on idle for 6 user tables does not match normal behavior. Were you running a benchmark or other workload prior to this? Could you roughly describe what the workload was doing at a high level? Or did you just create the 6 tables?
-
Could you collect a performance profile from the tserver that would help us understand the idle usage? Here are the commands you would need to run in the yb-tserver container to collect the profile.
$ yum install -y perl graphviz
$ /home/yugabyte/bin/pprof --text http://localhost:9000/pprof/profile --seconds 30 > /mnt/tserver/tserver-profile.txt
$ /home/yugabyte/bin/pprof --ps http://localhost:9000/pprof/profile --seconds 30 > /mnt/tserver/tserver-profile.ps
Please then copy the two files /mnt/tserver/tserver-profile.txt /mnt/tserver/tserver-profile.ps out of the container and attach them to this thread.
Thanks for helping us debug this issue!
Hi.
- See attach.
- After start β db updater just created empty tables, no api activities, no inserts/selects, etc. I have api1x3 instances connected to each of 6 tservers and api2x3 instances connected to each of 6 tservers β so, 6 clients connected to each tserver.
- See attach.
Thanks.
Regards,
Victor
(Attachment docker-compose.yml is missing)
(Attachment tserver-profile.ps is missing)
(Attachment tserver-profile.txt is missing)
@victor you attachments are missing. Can you try again ?
Can you please provide a working docker-compose file ?
Example: it misses version
on top, misses services, has undefined network, etc.
I tried with yb-docker-ctl multiple replicas and nodes and it worked fine:
Can you send the full docker-compose.yaml
and try the guide above ?
Hi.
After start cpu load = 5-6%.
1 table - cpu = 8%.
2 tables - cpu = 12%
4 tables - cpu = 18%
6 tables - 25%
P.S. Tables are empty, just created.
Iβm using ubuntu 18.04 (desktop) on virtual box vm, setting 8GB ram and 1 vcpu. (windows host)
I have 20 tables create table t1(id bigint primary key, a text, b text, c text, d text);
resulting in 240 tablets/tserver.
While Iβm getting 10% for each container:
^^^ running
docker stats
adds 30% to my vm cpu
Total on the ubuntu vm Iβm getting 10% total (not 10% * 3 containers):
After adding 300 tablets, resulting in 540 tablets/tserver, cpu went 35%+.
Looks like this is similar to [docdb] Scale to 1k+ tables Β· Issue #1317 Β· yugabyte/yugabyte-db Β· GitHub (raft group, which we are working on).
Because this is single-core, virtualized ubuntu-desktop (running firefox etc), looks like itβs having a bigger overhead than normally.
Solution:
Since you are in development environment, a way to work around this issue is setting --yb_num_shards_per_tserver
to 1 configuration.
While for production you should follow Deploy | YugabyteDB Docs guide.
Can you tell me more regarding your dev setup ?
How many tables,databases,indexes etc.
Hi.
- Ubuntu 18.04 on windows hyper-v and amazon EC2.
- 1 db, 8 empty tables.
Looks like tservers hangs on replication between tables.
The user was working with YCQL api.
He had 6 tservers in 1 node, 3 replicas, 3 masters, and created tables resulting in a big number of tablets.
Had overhead because many tablets, 3 masters + 6tservers in same node and virtualization.
Fixed by using with tablets
when creating tables in dev environments.