Here is a interesting situation. 3x Node via yugabyted.
3 Nodes, 3 T-Servers, 3 T-Masters
All 3 are working correctly.
2 Nodes, 2 T-Servers, 2 T-Masters
We took down node3 for a day. Ui sees that node 3 is down, everything keeps working. Great.
3 Nodes, 3 T-Servers, 2 T-Masters
We start node3 again with the same cli command. We see TServer is online, and the UI reports all 3 Nodes are active on the main page. Our tables are synced and all active. Great …
But, the 3th master did not go online and shows up as red in the nodes listing.
We check the :7000 port, nothing. We go to alive node :7000 and check.
xxx.xxx.xxx.xxx:7100 UNKNOWN_ROLE ERROR: Timed out (yb/rpc/outbound_call.cc:647): Unable to get registration information for peer ([xxx.xxx.xxx.xxx:7100]) id (8287ba4d41ad4d2c96729b1f1c3dcb6a): GetMasterRegistration RPC (request call id 1265511) to xxx.xxx.xxx.xxx:7100 timed out after 1.500s
Yep … she is down Captain.
3 Nodes, 3 T-Servers, 2 T-Masters
We waited a night, assuming there may be some timeout / retry mechanic every 1500s. We checked the next day. Still down.
3 Nodes, 3 T-Servers, 2 T-Masters
We take down node 3 again, and start it up before the UI detects node3 is down. Same as before, T-server is online, T-master is down.
3 Nodes, 3 T-Servers, 3 T-Masters
We take down node 3 again, and wait until the UI shows that we are down to 2 nodes. Start the node 3 again … drumroll … T-server and T-master are showing in the node listing.
Its a bit odd. We also checked the firewall and port 7100 was open as before between the nodes.
A improvement to the interface:
Please also add the T-Master status to the main UI, because its now very easy to be down to 2 T-Masters, without you realizing that your system is critical (one more and your system goes down). If you do not check the nodes status.
So if you did any maintenance assuming you had 3 T-Master up, restarted a live (2/3) T-Master Server/VPS, and you just took down your entire DB. Oeps …
Also if possible, expose more of the 7000 / 9000 UI’s directly via the yugabyted interface. Its a bit clumsy to have 3 different interfaces going on.