Why does the unit test fail to pass?

Please ignore the previous content!

  • Now, I have tried the latest release v0.9.8, there are only 3 tests failed.
99% tests passed, 3 tests failed out of 274

Total Test time (real) = 821.94 sec

The following tests FAILED:
       174 - rpc_rpc_stub-test (Failed)
       217 - integration-tests_master_config-itest (Failed)
       226 - integration-tests_linked_list-test (Failed)
Errors while running CTest

174: Test command: /home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/tests-rpc/rpc_stub-test --gtest_output=xml:/home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/yb-test-logs/tests-rpc__rpc_stub-test/RpcStubTest_IPv6.xml --gtest_filter=RpcStubTest.IPv6
174: Test exit status: 1
174: Log path: /home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/yb-test-logs/tests-rpc__rpc_stub-test/RpcStubTest_IPv6.log
174: Relevant log lines:
174: I0322 13:26:31.500322 22395 mem_tracker.cc:158] MemTracker: hard memory limit is 8.000000 GB
174: I0322 13:26:31.500339 22395 mem_tracker.cc:160] MemTracker: soft memory limit is 6.800000 GB
174: /home/donghq/jinlespace/yugabyte-db/src/yb/rpc/rpc_stub-test.cc:709: Failure
174: Value of: server_address.is_unspecified()
174: Actual: true
174: Expected: false
174: Test failure stack trace:
174: /home/donghq/jinlespace/yugabyte-db/src/yb/rpc/rpc_stub-test.cc:709: @ 0x41b4c2 yb::rpc::RpcStubTest_IPv6_Test::TestBody()
174: Updating /home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/yb-test-logs/tests-rpc__rpc_stub-test/RpcStubTest_IPv6.xml with a link to test log

217: Test command: /home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/tests-integration-tests/master_config-itest --gtest_output=xml:/home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/yb-test-logs/tests-integration-tests__master_config-itest/MasterChangeConfigTest_TestNewLeaderWithPendingConfigLoadsSysCatalog.xml --gtest_filter=MasterChangeConfigTest.TestNewLeaderWithPendingConfigLoadsSysCatalog
217: Test exit status: 1
217: Log path: /home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/yb-test-logs/tests-integration-tests__master_config-itest/MasterChangeConfigTest_TestNewLeaderWithPendingConfigLoadsSysCatalog.log
217: Relevant log lines:
217: W0322 13:36:58.572082 13480 external_mini_cluster.cc:1007] Failed to get leader master after 100 attempts, returning the first master.
217: I0322 13:36:58.572091 13480 master_config-itest.cc:347] New leader
217: /home/donghq/jinlespace/yugabyte-db/src/yb/integration-tests/master_config-itest.cc:348: Failure
217: Expected: new_master->bound_rpc_addr().port()
217: Which is: 18567
217: To be equal to: new_leader->bound_rpc_addr().port()
217: Updating /home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/yb-test-logs/tests-integration-tests__master_config-itest/MasterChangeConfigTest_TestNewLeaderWithPendingConfigLoadsSysCatalog.xml with a link to test log

226: Test command: /home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/tests-integration-tests/linked_list-test --gtest_output=xml:/home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/yb-test-logs/tests-integration-tests__linked_list-test/LinkedListTest_TestLoadWhileOneServerDownAndVerify.xml --gtest_filter=LinkedListTest.TestLoadWhileOneServerDownAndVerify
226: Test exit status: 1
226: Log path: /home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/yb-test-logs/tests-integration-tests__linked_list-test/LinkedListTest_TestLoadWhileOneServerDownAndVerify.log
226: Relevant log lines:
226: @ 0xffffffffffffffff
226: *** Check failure stack trace: ***
226: @ 0x7f1e7f3f4e8b DumpStackTraceAndExit (yb/util/logging.cc:156)
226: @ 0x7f1e7e5bb10c
226: tests-integration-tests/linked_list-test failed to produce an XML output file at /home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/yb-test-logs/tests-integration-tests__linked_list-test/LinkedListTest_TestLoadWhileOneServerDownAndVerify.xml
226: Generating an XML output file using parse_test_failure.py: /home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/yb-test-logs/tests-integration-tests__linked_list-test/LinkedListTest_TestLoadWhileOneServerDownAndVerify.xml
226: Updating /home/donghq/jinlespace/yugabyte-db/build/release-gcc-dynamic-community/yb-test-logs/tests-integration-tests__linked_list-test/LinkedListTest_TestLoadWhileOneServerDownAndVerify.xml with a link to test log


Could you check other files in this directory, and upload relevant files/error snippets from those files if any.

Adding Mikhail (@mbautin) in case he has any ideas off-hand.

Hi @lumedar, thank you for reporting this issue! A lot of our tests, including the ones you mentioned above, are really integration tests that are running an entire distributed YugaByte DB mini-cluster using multiple processes locally, and therefore they can be quite sensitive to the test environment. Can you share the specs (available memory / # of CPUs) of the host that you are running the tests on? Also could you please tar/gz and share the yb-test-logs directory so we can take a closer look at the test failures that you are observing?

Also could you please post the output of ulimit -a in the shell that you’re running the tests from? Specifically, setting both the limit on the number of open files and the number of processes/threads to something really high, such as 100000, might help.

I would also recommend that you try out the master branch or the more recent v0.9.8.0 tag.

We will be sharing more details about our testing methodology soon.


I have uploaded the relevant logs, and here is the `ulimit -a`:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 512453
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1000000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 512453
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Thanks. I have uploaded the relevant files.

The last two test cases can sometimes pass.

Nice :slight_smile: . Where there any other changes in your environment that you think might have helped? Even with older release, would have expected most of the tests to pass.



Posting relevant snippet from the reported log:


Have we seen this in the test runs? Could this be related to increasing the default number of missed heartbeats when a leader election is triggered?

[m-2] I0322 13:36:58.571132 15390 raft_consensus.cc:436] Election start skipped as do_not_start_election_test_only flag is set to true.
I0322 13:36:58.572062 13480 external_mini_cluster.cc:1005] GetLeaderMasterIndex@100 hit error: Timed out (yb/rpc/rpc.cc:166): GetLeaderMasterRpc(addrs:,,,, num_attempts: 99) passed its deadline 15599955.996s (now: 15599956.081s)
W0322 13:36:58.572082 13480 external_mini_cluster.cc:1007] Failed to get leader master after 100 attempts, returning the first master.
I0322 13:36:58.572091 13480 master_config-itest.cc:347] New leader
/home/donghq/jinlespace/yugabyte-db/src/yb/integration-tests/master_config-itest.cc:348: Failure
Expected: new_master->bound_rpc_addr().port()
Which is: 18567
To be equal to: new_leader->bound_rpc_addr().port()
Which is: 27222

We also see this particular error in our internal CI environment. Opened a GitHub issue: https://github.com/YugaByte/yugabyte-db/issues/120

  1. No removing the “-Werror” flags
  2. Using YugaByte/linuxbrew instead of devtoolset-6
  3. Using the latest release branch
    The above three changes may help to pass more tests.

Now I try to compile the release v0.9.8 and run its unit test in a docker container(base image:centos7), here is the result:

99% tests passed, 1 tests failed out of 274

Total Test time (real) = 608.35 sec

The following tests FAILED:
        135 - util_inetaddress-test (Failed)
Errors while running CTest

Most of the time, util_inetaddress-test failed with:

tcmalloc: large alloc 1811955712 bytes == 0x188c000 @  0x7f0958f46092 0x7f095679b699
*** Aborted at 1521790543 (unix time) try "date -d @1521790543" if you are using GNU date ***
PC: @     0x7f0955d22278 __memcpy_avx_unaligned
*** SIGSEGV (@0x0) received by PID 12368 (TID 0x7f095a1f4840) from PID 0; stack trace: ***

the log file:https://github.com/lumedar/YugaByteLog/blob/master/InetAddressTest_TestErrors.log
In a few cases, it can pass.
So, can you give me more help? Thanks!

in function “int SubstitutedSize(StringPiece format,const SubstituteArg* const* args_array);” of substitute.cc
I added this test code:

LOG(INFO) << "args: " << args_array[index]->data() <<" len: "<<args_array[index]->size();


I0323 10:21:08.141396  2653 substitute.cc:63] MYTEST_BEGIN
I0323 10:21:08.141398  2653 substitute.cc:64] args: Connection timed out len: 7017560849296405558
I0323 10:21:08.141402  2653 substitute.cc:65] MYTEST_END


Haven’t seen this failure. Filed an issue: https://github.com/YugaByte/yugabyte-db/issues/121 – will take a look soon. Thank you!

Thanks for reporting this @lumedar. We do see this in our CI runs at the rate of 1 in 20 times or so. This should be independent of the increase of missed heartbeat gflag change @kannan - we do stop elections using do_not_start_election_test_only in this unit test. Will triage this with higher priority…