I deployed yugabyte 2.9.0 to five on-prem servers (96 cores, 377 GB mem, 8 SSDs):
3 masters and 6 t-servers using 6 SSDs for fs_data_dirs, 1 SSD for fs_wal_dirs, and 1 SSD for log_dir
I connect using ysqlsh and when I attempt to create a new schema or table, the client times out. I have tried upping the timeout to 3 minutes but if still times out.
Here are the commands for the yb-master servers:
./bin/yb-master \
--master_addresses node1:7100, node2:7100, node3:7100 \
--fs_data_dirs "/data01,/data02,/data03,/data04,/data05,/data06" \
--fs_wal_dirs /data07 \
--log_dir /data08 \
--rpc_bind_addresses node1:7100 \
--server_broadcast_addresses node1:7100 \
--durable_wal_write=true
./bin/yb-master \
--master_addresses node1:7100, node2:7100, node3:7100 \
--fs_data_dirs "/data01,/data02,/data03,/data04,/data05,/data06" \
--fs_wal_dirs /data07 \
--log_dir /data08 \
--rpc_bind_addresses node2:7100 \
--server_broadcast_addresses node2:7100 \
--durable_wal_write=true
./bin/yb-master \
--master_addresses node1:7100, node2:7100, node3:7100 \
--fs_data_dirs "/data01,/data02,/data03,/data04,/data05,/data06" \
--fs_wal_dirs /data07 \
--log_dir /data08 \
--rpc_bind_addresses node3:7100 \
--server_broadcast_addresses node3:7100 \
--durable_wal_write=true
All master servers start up and from the Admin UI I can see all three masters (1 leader and 2 followers).
And the t-servers (port 9100 was in use so I opted to use 9200):
./bin/yb-tserver \
--tserver_master_addrs node1:7100, node2:7100,node3:7100 \
--fs_data_dirs "/data01,/data02,/data03,/data04,/data05,/data06" \
--fs_wal_dirs /data07 \
--log_dir /data08 \
--rpc_bind_addresses node1:9200 \
--server_broadcast_addresses node1:9200 \
--durable_wal_write=true
./bin/yb-tserver \
--tserver_master_addrs node1:7100, node2:7100,node3:7100 \
--fs_data_dirs "/data01,/data02,/data03,/data04,/data05,/data06" \
--fs_wal_dirs /data07 \
--log_dir /data08 \
--rpc_bind_addresses node2:9200 \
--server_broadcast_addresses node2:9200 \
--durable_wal_write=true
./bin/yb-tserver \
--tserver_master_addrs node1:7100, node2:7100,node3:7100 \
--fs_data_dirs "/data01,/data02,/data03,/data04,/data05,/data06" \
--fs_wal_dirs /data07 \
--log_dir /data08 \
--rpc_bind_addresses node3:9200 \
--server_broadcast_addresses node3:9200 \
--durable_wal_write=true
./bin/yb-tserver \
--tserver_master_addrs node1:7100, node2:7100,node3:7100 \
--fs_data_dirs "/data01,/data02,/data03,/data04,/data05,/data06" \
--fs_wal_dirs /data07 \
--log_dir /data08 \
--rpc_bind_addresses node4:9200 \
--server_broadcast_addresses node4:9200 \
--durable_wal_write=true
./bin/yb-tserver \
--tserver_master_addrs node1:7100, node2:7100,node3:7100 \
--fs_data_dirs "/data01,/data02,/data03,/data04,/data05,/data06" \
--fs_wal_dirs /data07 \
--log_dir /data08 \
--rpc_bind_addresses node5:9200 \
--server_broadcast_addresses node5:9200 \
--durable_wal_write=true
As I add each t-server I can see in the logs of the master that it is registering each new t-server. However, once I get to the 3rd t-server, I can see the master logs complain about “Delete Tablet RPC failed for tablet e0849349****: Network error: Connect timeout, passed: 15s” and “Create Tablet RPC failed for tablet ****: Netowork error: Connect timeout, passed: 15s.”, as well as “Hinted Leader Start Election RPC failed” with similar message.
Any ideas as to what could be my issue? These are Centos servers.