Cannot start master node

hi, I have 6 VM on premise in two data centers. My goal is to setup one cluster with 3 masters and 6 tservers.
I’ve started from setting up masters, but one node doesn’t start.

I’ve installed it based on the following instruction Manual deployment of YugabyteDB clusters | YugabyteDB Docs

Configs of nodes looks as follows:
alpvm

--master_addresses=alpvm:7100,azhvm:7100,bzhvm:7100
--rpc_bind_addresses=alpvm:7100
--fs_data_dirs=/yugabyte/yuga_data
--placement_cloud=onpremise
--placement_region=ch
--placement_zone=lp

azhvm

--master_addresses=alpvm:7100,azhvm:7100,bzhvm:7100
--rpc_bind_addresses=azhvm:7100
--fs_data_dirs=/yugabyte/yuga_data
--placement_cloud=onpremise
--placement_region=ch
--placement_zone=zh

bzhvm

--master_addresses=alpvm:7100,azhvm:7100,bzhvm:7100
--rpc_bind_addresses=bzhvm:7100
--fs_data_dirs=/yugabyte/yuga_data
--placement_cloud=onpremise
--placement_region=ch
--placement_zone=zh

The problem is with azhvm. The other masters keep trying reach it, so it looks like correct behavior.

I’m starting the yb-master process with following command:
/home/yugabyte/yugabyte-2024.2.0.0/bin/yb-master --flagfile /home/yugabyte/yugabyte-2024.2.0.0/bin/master.conf >& /yugabyte/yuga_data/yb-master.out &

The same command I use to start master on other nodes.

Logs from yb-master.out:

tcmalloc/tcmalloc.cc:1577] GetOwnership(ptr) != tcmalloc::MallocExtension::Ownership::kNotOwned @ 0x56498f0afc79 0x56498f07f868 0x7fc5dc6c2287 0x7fc5dc6b8d08 0x7fc5dc6b96b5 0x7fc5dcfc6a20
*** Aborted at 1733915623 (unix time) try "date -d @1733915623" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGABRT (@0x46b90019ad1e) received by PID 1682718 (TID 0x7fc5dcfcf640) from PID 1682718; stack trace: ***
tcmalloc/tcmalloc.cc:1577] GetOwnership(ptr) != tcmalloc::MallocExtension::Ownership::kNotOwned @ 0x56498f0afc79 0x56498f07f868 0x7fc5dc6c2287 0x7fc5dc6b8227 0x7fc5ddef6a31 0x7fffffffe000 0x56498e873dc2 0x56498e08be7a 0x56498d6ed31d 0x56498d8359e8 0x56498e8cff95 0x56498e8caa53 0x7fc5dde89c02
    @     0x7fc5dde8b94c __pthread_kill_implementation
    @     0x7fc5dde3e646 __GI_raise
    @     0x7fc5dde287f3 __GI_abort
    @     0x56498f0afd58 tcmalloc::tcmalloc_internal::Crash()
    @     0x56498f07f868 malloc_usable_size
    @     0x7fc5dc6c2287 (unknown)
    @     0x7fc5dc6b8d08 _nss_resolve_gethostbyname3_r
    @     0x7fc5dc6b96b5 _nss_resolve_gethostbyname2_r
    @     0x7fc5ddf1e729 __new_gethostbyname2_r
    @     0x7fffffffe000 (unknown)

Last lines from LOG.INFO

I1211 12:13:43.840667 1682757 client-internal.cc:2716] Reinitialize master addresses from file: /home/yugabyte/yugabyte-2024.2.0.0/bin/master.conf
W1211 12:13:43.840728 1682757 client-internal.cc:2725] Couldn't find flag tserver_master_addrs in flagfile /home/yugabyte/yugabyte-2024.2.0.0/bin/master.conf
I1211 12:13:43.840745 1682757 client-internal.cc:2745] New master addresses: [alpvm:7100,azhvm:7100,bzhvm:7100]
I1211 12:13:43.841457 1682758 client-internal.cc:2716] Reinitialize master addresses from file: /home/yugabyte/yugabyte-2024.2.0.0/bin/master.conf
W1211 12:13:43.841527 1682758 catalog_manager.cc:1645] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:11539): Node b2d59e43de9f4cb4b45ddd397e24508d peer not initialized.
I1211 12:13:43.841545 1682758 client-internal.cc:2745] New master addresses: [alpvm:7100,azhvm:7100,bzhvm:7100]

WARNING FILE:

Log file created at: 2024/12/11 12:13:43
Current UTC time: 2024/12/11 11:13:43
Running on machine: azhvm
Application fingerprint: version 2024.2.0.0 build 145 revision 0585c935ab8ad2e61086448a45bc918c9019e6fd build_type RELEASE built at 04 Dec 2024 10:01:22 UTC
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W1211 12:13:43.838276 1682749 catalog_manager.cc:1645] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:11539): Node b2d59e43de9f4cb4b45ddd397e24508d peer not initialized.
W1211 12:13:43.840728 1682757 client-internal.cc:2725] Couldn't find flag tserver_master_addrs in flagfile /home/yugabyte/yugabyte-2024.2.0.0/bin/master.conf
W1211 12:13:43.841527 1682758 catalog_manager.cc:1645] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:11539): Node b2d59e43de9f4cb4b45ddd397e24508d peer not initialized.

I haven’t started any tservers so far, because I wanted to make masters stable first.

Can you try adding --replication_factor=3 to each of the config files?

Unfortunately, I have added this flag and have the same result. This azhvm doesn’t start.

However, I did other experiment. I removed azhvm from configuration of alpvm and bzhvm. After this change cluster started correctly. The problem is that, I want to have 3 masters.

Are the required ports open on the ornery azhvm node?

I guess so.
I can make a request from azhvm to alpvm and bzhvm on ports 7000 and 7100.
I see in the logs, that 7100 is used to communicate between masters. So it looks fine.

Did you mean to have the same placement for nodes azhvm and bzhvm?

–placement_cloud=onpremise
–placement_region=ch
–placement_zone=zh

yes. I wanted to have redundancy in ZH and attach 2 tservers to each master, but maybe it is not allowed approach.
What is your advice on how many yb-masters I should spawn?
What is best practice for yugabyte?
One per placement zone ?
One per tserver?

Please read the Replication section Deployment checklist for YugabyteDB clusters | YugabyteDB Docs

Thanks, so I should have 3 yb-masters, because I aim into replication factor 3.

I don’t have an issue running your commands, at least not on a single server…

[root@yugabytedb ~]# cat master1.conf
--master_addresses=127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100
--rpc_bind_addresses=127.0.0.1:7100
--fs_data_dirs=/root/yugabyte/yuga_data1
--placement_cloud=onpremise
--placement_region=ch
--placement_zone=zh

[root@yugabytedb ~]# cat master2.conf
--master_addresses=127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100
--rpc_bind_addresses=127.0.0.2:7100
--fs_data_dirs=/root/yugabyte/yuga_data2
--placement_cloud=onpremise
--placement_region=ch
--placement_zone=zh

[root@yugabytedb ~]# cat master3.conf
--master_addresses=127.0.0.1:7100,127.0.0.2:7100,127.0.0.3:7100
--rpc_bind_addresses=127.0.0.3:7100
--fs_data_dirs=/root/yugabyte/yuga_data3
--placement_cloud=onpremise
--placement_region=ch
--placement_zone=zh

[root@yugabytedb ~]# yb-master --flagfile master1.conf &
[1] 3015278

[root@yugabytedb ~]# yb-master --flagfile master2.conf &
[2] 3015316

[root@yugabytedb ~]# yb-master --flagfile master3.conf &
[3] 3015353

[root@yugabytedb ~]# ps -ef | grep master
root     3015278 2966702  8 14:13 pts/0    00:00:02 yb-master --flagfile master1.conf
root     3015316 2966702 10 14:13 pts/0    00:00:02 yb-master --flagfile master2.conf
root     3015353 2966702 11 14:13 pts/0    00:00:01 yb-master --flagfile master3.conf

I have ran all the commends on the other servers. And it works as expected. It seems that it was something wrong with the VM.