Can only connect to 1 box in a 3 box cluster

Exocomp · May 27, 2019, 3:00am

YugaByte Version 1.2.6
Server: Ubuntu Server 16.04
Client: Develop applications | YugabyteDB Docs

I have a 3 box cluster which I created with the steps from the tutorial here:

https://docs.yugabyte.com/latest/deploy/manual-deployment/start-tservers/

The problem is that I can only connect (YCQL endpoint) to one specific box, is there any extra configuration I need to do to be able to connect to all 3 boxes.

Here is the behavior I’m seeing:

With all 3 boxes up and running (both master and tserver).
** 192.168.0.101
** 192.168.0.102
** 192.168.0.103
I can connect to all 3 locally logged into the box
However, I can only connect to 192.168.0.103 externally
If I stop the services on 192.168.0.103
** Then I connect just fine to 192.168.0.102 externally

There is no network issue and the box configuration is the same.

Are there any known issues or anything specific that needs to be done to be able to connect to all 3 boxes?

Note: There is no error that I see, the client tires to connect but just spins and hangs, there is no response returned.

kannan · May 27, 2019, 3:26am

hi @Exocomp:

That’s odd. Can you share the output of the full command line for the yb-tserver process (including any gflags or contents of a gflag file you are passing to it)?

For example, from one of our test clusters:

[yugabyte@yb-15-yugabyte-adoption-3-n1 ~]$ ps auxww | grep yb-tserver
yugabyte  3710  0.0  0.0 112680   672 pts/0    D+   03:24   0:00 grep --color=auto yb-tserver
yugabyte 22883 18.4 55.1 13780488 4129524 ?    Sl   Apr26 8250:29 /home/yugabyte/tserver/bin/yb-tserver --flagfile /home/yugabyte/tserver/conf/server.conf

where the gflags file is something like:

[yugabyte@yb-15-yugabyte-adoption-3-n1 ~]$ cat /home/yugabyte/tserver/conf/server.conf
--tserver_master_addrs=10.150.0.45:7100,10.150.0.46:7100,10.150.0.50:7100
--webserver_port=9000
--placement_cloud=gcp
--placement_region=us-west1
--max_log_size=256
--placement_zone=us-west1-b
--placement_uuid=4d9834cc-6d6e-4dc4-89ef-6a8590f59f43
--rpc_bind_addresses=10.150.0.46:9100
--cql_proxy_bind_address=10.150.0.46:9042
--fs_data_dirs=/mnt/d0,/mnt/d1
--webserver_interface=10.150.0.46
--redis_proxy_bind_address=10.150.0.46:6379

regards,
Kannan

kannan · May 27, 2019, 3:28am

Additionally, when being connected to any one node via cqlsh could you also share the contents of thge system.local and system.peers tables.

For example, something like this:

yugabyte@yb-15-yugabyte-adoption-3-n1 ~]$ ~/tserver/bin/cqlsh 10.150.0.45
Connected to local cluster at 10.150.0.45:9042.
[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> select * from system.local;

 key   | bootstrapped | broadcast_address | cluster_name  | cql_version | data_center | gossip_generation | host_id | listen_address | native_protocol_version | partitioner                                 | rack       | release_version | rpc_address | schema_version                       | thrift_version | tokens | truncated_at
-------+--------------+-------------------+---------------+-------------+-------------+-------------------+---------+----------------+-------------------------+---------------------------------------------+------------+-----------------+-------------+--------------------------------------+----------------+--------+--------------
 local |    COMPLETED |       10.150.0.45 | local cluster |       3.4.2 |    us-west1 |                 0 |    null |    10.150.0.45 |                       4 | org.apache.cassandra.dht.Murmur3Partitioner | us-west1-a |    3.9-SNAPSHOT | 10.150.0.45 | 00000000-0000-0000-0000-000000000000 |         20.1.0 |  {'0'} |         null

(1 rows)
cqlsh> select * from system.peers;

 peer        | data_center | host_id                              | preferred_ip | rack       | release_version | rpc_address | schema_version                       | tokens
-------------+-------------+--------------------------------------+--------------+------------+-----------------+-------------+--------------------------------------+-------------------------
 10.150.0.46 |    us-west1 | b12f820e-f972-d2b7-bb46-7361fb8eaf1c |  10.150.0.46 | us-west1-b |            null | 10.150.0.46 | 00000000-0000-0000-0000-000000000000 |                   {'0'}
 10.150.0.50 |    us-west1 | dfb11d7f-7eb9-52a6-6d4a-90eb4ccd368b |  10.150.0.50 | us-west1-c |            null | 10.150.0.50 | 00000000-0000-0000-0000-000000000000 | {'6148820866244280320'}

(2 rows)

Exocomp · May 27, 2019, 4:22am

@kannan

Thanks for the help. Here is what you requested:

administrator@box-100:~$ ps auxww | grep yb-tserver
adminis+ 179593  2.0  4.7 1268864 95792 ?       Ssl  23:05   0:13 /opt/yugabyte/yugabyte-1.2.6.0/bin/yb-tserver --flagfile /etc/yugabyte/tserver.conf --log_dir /mnt/log/yugabyte

--tserver_master_addrs=192.168.0.101:7100,192.168.0.102:7100,192.168.0.103:7100
--rpc_bind_addresses=192.168.0.101
--cql_proxy_bind_address=192.168.0.101:9042
--redis_proxy_bind_address=192.168.0.101:6379
--fs_data_dirs=/mnt/data/yugabyte

administrator@box-100:~$ /opt/yugabyte/yugabyte-1.2.6.0/bin/cqlsh 192.168.0.101
Connected to local cluster at 192.168.0.101:9042.
[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> select * from system.local;

 key   | bootstrapped | broadcast_address | cluster_name  | cql_version | data_center | gossip_generation | host_id | listen_address | native_protocol_version | partitioner                                 | rack  | release_version | rpc_address | schema_version                       | thrift_version | tokens | truncated_at
-------+--------------+-------------------+---------------+-------------+-------------+-------------------+---------+----------------+-------------------------+---------------------------------------------+-------+-----------------+-------------+--------------------------------------+----------------+--------+--------------
 local |    COMPLETED |         192.168.0.101 | local cluster |       3.4.2 | datacenter1 |                 0 |    null |      192.168.0.101 |                       4 | org.apache.cassandra.dht.Murmur3Partitioner | rack1 |    3.9-SNAPSHOT |   192.168.0.101 | 00000000-0000-0000-0000-000000000000 |         20.1.0 |  {'0'} |         null

(1 rows)

NOTE: the other boxes are configured as above but with their respective IP addresses.

I just noticed that using cqlsh from the boxes I can connect to any other box. However, using (Develop applications | YugabyteDB Docs) client I’m seeing the issue I described. Here is the piece of code from the C# client:

var db = Cluster.Builder()
	.AddContactPoint("192.168.0.102")
	.Build();
db.Connect("mycluster");

It hangs on .Connect.

However, as I mentioned if I stop the master and tserver process on 192.168.0.103 then I can connect just fine to 192.168.0.102. So it’s not a network issue.

kannan · May 27, 2019, 5:22am

hi @Exocomp

Can you also share the output of this query when connected to 192.168.0.101

select * from system.peers;

And perhaps the same two queries from another node, say:

/opt/yugabyte/yugabyte-1.2.6.0/bin/cqlsh 192.168.0.102

select * from system.local;
select * from system.peers;

Exocomp · May 27, 2019, 1:14pm

Hi @kannan,

Here is the output of those commands:

administrator@box100:~$ /opt/yugabyte/yugabyte-1.2.6.0/bin/cqlsh 192.168.0.101
Connected to local cluster at 192.168.0.101:9042.
[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> select * from system.peers;

 peer      | data_center | host_id                              | preferred_ip | rack  | release_version | rpc_address | schema_version                       | tokens
-----------+-------------+--------------------------------------+--------------+-------+-----------------+-------------+--------------------------------------+-------------------------
 192.168.0.103 | datacenter1 | 0e7a6228-ec9f-4ac3-a292-6825afc3bca0 |    192.168.0.103 | rack1 |            null |   192.168.0.103 | 00000000-0000-0000-0000-000000000000 |                   {'0'}
 192.168.0.102 | datacenter1 | 29ae68ef-f58e-4c14-ac5a-26084739ae13 |    192.168.0.102 | rack1 |            null |   192.168.0.102 | 00000000-0000-0000-0000-000000000000 | {'6148820866244280320'}

(2 rows)

administrator@box101:~$ /opt/yugabyte/yugabyte-1.2.6.0/bin/cqlsh 192.168.0.102
Connected to local cluster at 192.168.0.102:9042.
[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> select * from system.local;

 key   | bootstrapped | broadcast_address | cluster_name  | cql_version | data_center | gossip_generation | host_id | listen_address | native_protocol_version | partitioner                                 | rack  | release_version | rpc_address | schema_version                       | thrift_version | tokens | truncated_at
-------+--------------+-------------------+---------------+-------------+-------------+-------------------+---------+----------------+-------------------------+---------------------------------------------+-------+-----------------+-------------+--------------------------------------+----------------+--------+--------------
 local |    COMPLETED |         192.168.0.102 | local cluster |       3.4.2 | datacenter1 |                 0 |    null |      192.168.0.102 |                       4 | org.apache.cassandra.dht.Murmur3Partitioner | rack1 |    3.9-SNAPSHOT |   192.168.0.102 | 00000000-0000-0000-0000-000000000000 |         20.1.0 |  {'0'} |         null

(1 rows)
cqlsh> select * from system.peers;

 peer      | data_center | host_id                              | preferred_ip | rack  | release_version | rpc_address | schema_version                       | tokens
-----------+-------------+--------------------------------------+--------------+-------+-----------------+-------------+--------------------------------------+--------------------------
 192.168.0.103 | datacenter1 | 0e7a6228-ec9f-4ac3-a292-6825afc3bca0 |    192.168.0.103 | rack1 |            null |   192.168.0.103 | 00000000-0000-0000-0000-000000000000 |                    {'0'}
 192.168.0.101 | datacenter1 | 4c256ad5-0eef-4523-a34b-ca1bcd3facbb |    192.168.0.101 | rack1 |            null |   192.168.0.101 | 00000000-0000-0000-0000-000000000000 | {'-6149102341220990976'}

(2 rows)

NOTE: I can connect using cqlsh if I log into any box to any other box. The issue is only when using the C# client (Develop applications | YugabyteDB Docs) where I can only connect to 192.168.0.103 (when all 3 boxes are running) and then when I stop master and tserver on 192.168.0.103 then can connect to 192.168.0.102 (from the client).

Exocomp · May 27, 2019, 2:19pm

@kannan

Is there a way to increase the logging level of yugabyte? The INFO, WARNING, ERROR logs don’t produce anything when I connect to a boxes with the issue.

Also when I can’t connect to the boxes I mentioned, the client CPU spikes like it is stuck internally in a loop or doing heavy operations internally (from a client perspective I just see it stuck with .Connect). So seems like it does connect but doesn’t like what it is receiving from Yugabyte.

kannan · May 31, 2019, 3:10pm

hi @Exocomp

Could this be an issue similar to [CSHARP-480] - DataStax (our YCQL C# driver is a fork of the Apache Cassandra driver)?

Could you try to enable tracing on as recommended here, and see if we can learn anything from the logs:

https://datastax-oss.atlassian.net/browse/CSHARP-480?focusedCommentId=32400&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-32400

regards,
Kannan

kannan · May 31, 2019, 6:49pm

hi @Exocomp

I think we have a handle on this … and it something specific to YugaByte and C# driver combo. Will keep you posted as soon as we have the fix. Hoping in the next few days.

regards,
Kannan

kannan · June 1, 2019, 12:44am

Tracking the issue here: system.local always returns 0 token · Issue #1467 · yugabyte/yugabyte-db · GitHub

kannan · June 1, 2019, 8:32pm

@Exocomp

We have done both a YugaByte C# driver side fix (GitHub - yugabyte/cassandra-csharp-driver: YugaByte C# Driver for YugaByte DB's Cassandra-compatible YCQL API) and a server-side fix (system.local always returns 0 token · Issue #1467 · yugabyte/yugabyte-db · GitHub).

Either of the fix (using the new driver or waiting for the release with the above server-side fix) should help avoid the problem.

Can you please give this a try with the 3.7.1 version of the YugaByte C# Driver (NuGet Gallery | YugaByteCassandraCSharpDriver 3.16.3)?

regards,
Kannan

Exocomp · June 2, 2019, 12:11am

@kannan

I tried 3.7.1, that resolved the issue.

Looking over the commit that fixed it, looks like it is bypassing the token map generation so sounds like before it was going somewhere before where it should not have. Merge pull request #6 from spolitov/token_map · yugabyte/cassandra-csharp-driver@6fc2772 · GitHub

Thanks for the quick fix and glad I could contribute.

Topic		Replies	Views
Client can not connect yugabyte cluster three node, after yb-tmaster leader down General	2	1043	July 16, 2020
Docker-compose remote TServer General	2	991	March 20, 2023
How can we setup 3 node master master cluster in one data center? General	4	921	August 12, 2021
Yugabyte Force Join a node to Master Leader General	6	65	May 7, 2025
Port 7000 not open, UI connection time out General	12	3212	October 9, 2019

Can only connect to 1 box in a 3 box cluster

Related topics