YugaByte Version 1.2.6
Server: Ubuntu Server 16.04
Client: Develop applications | YugabyteDB Docs
I have a 3 box cluster which I created with the steps from the tutorial here:
https://docs.yugabyte.com/latest/deploy/manual-deployment/start-tservers/
The problem is that I can only connect (YCQL endpoint) to one specific box, is there any extra configuration I need to do to be able to connect to all 3 boxes.
Here is the behavior I’m seeing:
With all 3 boxes up and running (both master and tserver).
** 192.168.0.101
** 192.168.0.102
** 192.168.0.103
I can connect to all 3 locally logged into the box
However, I can only connect to 192.168.0.103 externally
If I stop the services on 192.168.0.103
** Then I connect just fine to 192.168.0.102 externally
There is no network issue and the box configuration is the same.
Are there any known issues or anything specific that needs to be done to be able to connect to all 3 boxes?
Note: There is no error that I see, the client tires to connect but just spins and hangs, there is no response returned.
hi @Exocomp :
That’s odd. Can you share the output of the full command line for the yb-tserver process (including any gflags or contents of a gflag file you are passing to it)?
For example, from one of our test clusters:
[yugabyte@yb-15-yugabyte-adoption-3-n1 ~]$ ps auxww | grep yb-tserver
yugabyte 3710 0.0 0.0 112680 672 pts/0 D+ 03:24 0:00 grep --color=auto yb-tserver
yugabyte 22883 18.4 55.1 13780488 4129524 ? Sl Apr26 8250:29 /home/yugabyte/tserver/bin/yb-tserver --flagfile /home/yugabyte/tserver/conf/server.conf
where the gflags file is something like:
[yugabyte@yb-15-yugabyte-adoption-3-n1 ~]$ cat /home/yugabyte/tserver/conf/server.conf
--tserver_master_addrs=10.150.0.45:7100,10.150.0.46:7100,10.150.0.50:7100
--webserver_port=9000
--placement_cloud=gcp
--placement_region=us-west1
--max_log_size=256
--placement_zone=us-west1-b
--placement_uuid=4d9834cc-6d6e-4dc4-89ef-6a8590f59f43
--rpc_bind_addresses=10.150.0.46:9100
--cql_proxy_bind_address=10.150.0.46:9042
--fs_data_dirs=/mnt/d0,/mnt/d1
--webserver_interface=10.150.0.46
--redis_proxy_bind_address=10.150.0.46:6379
regards,
Kannan
Additionally, when being connected to any one node via cqlsh
could you also share the contents of thge system.local
and system.peers
tables.
For example, something like this:
yugabyte@yb-15-yugabyte-adoption-3-n1 ~]$ ~/tserver/bin/cqlsh 10.150.0.45
Connected to local cluster at 10.150.0.45:9042.
[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> select * from system.local;
key | bootstrapped | broadcast_address | cluster_name | cql_version | data_center | gossip_generation | host_id | listen_address | native_protocol_version | partitioner | rack | release_version | rpc_address | schema_version | thrift_version | tokens | truncated_at
-------+--------------+-------------------+---------------+-------------+-------------+-------------------+---------+----------------+-------------------------+---------------------------------------------+------------+-----------------+-------------+--------------------------------------+----------------+--------+--------------
local | COMPLETED | 10.150.0.45 | local cluster | 3.4.2 | us-west1 | 0 | null | 10.150.0.45 | 4 | org.apache.cassandra.dht.Murmur3Partitioner | us-west1-a | 3.9-SNAPSHOT | 10.150.0.45 | 00000000-0000-0000-0000-000000000000 | 20.1.0 | {'0'} | null
(1 rows)
cqlsh> select * from system.peers;
peer | data_center | host_id | preferred_ip | rack | release_version | rpc_address | schema_version | tokens
-------------+-------------+--------------------------------------+--------------+------------+-----------------+-------------+--------------------------------------+-------------------------
10.150.0.46 | us-west1 | b12f820e-f972-d2b7-bb46-7361fb8eaf1c | 10.150.0.46 | us-west1-b | null | 10.150.0.46 | 00000000-0000-0000-0000-000000000000 | {'0'}
10.150.0.50 | us-west1 | dfb11d7f-7eb9-52a6-6d4a-90eb4ccd368b | 10.150.0.50 | us-west1-c | null | 10.150.0.50 | 00000000-0000-0000-0000-000000000000 | {'6148820866244280320'}
(2 rows)
@kannan
Thanks for the help. Here is what you requested:
administrator@box-100:~$ ps auxww | grep yb-tserver
adminis+ 179593 2.0 4.7 1268864 95792 ? Ssl 23:05 0:13 /opt/yugabyte/yugabyte-1.2.6.0/bin/yb-tserver --flagfile /etc/yugabyte/tserver.conf --log_dir /mnt/log/yugabyte
--tserver_master_addrs=192.168.0.101:7100,192.168.0.102:7100,192.168.0.103:7100
--rpc_bind_addresses=192.168.0.101
--cql_proxy_bind_address=192.168.0.101:9042
--redis_proxy_bind_address=192.168.0.101:6379
--fs_data_dirs=/mnt/data/yugabyte
administrator@box-100:~$ /opt/yugabyte/yugabyte-1.2.6.0/bin/cqlsh 192.168.0.101
Connected to local cluster at 192.168.0.101:9042.
[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> select * from system.local;
key | bootstrapped | broadcast_address | cluster_name | cql_version | data_center | gossip_generation | host_id | listen_address | native_protocol_version | partitioner | rack | release_version | rpc_address | schema_version | thrift_version | tokens | truncated_at
-------+--------------+-------------------+---------------+-------------+-------------+-------------------+---------+----------------+-------------------------+---------------------------------------------+-------+-----------------+-------------+--------------------------------------+----------------+--------+--------------
local | COMPLETED | 192.168.0.101 | local cluster | 3.4.2 | datacenter1 | 0 | null | 192.168.0.101 | 4 | org.apache.cassandra.dht.Murmur3Partitioner | rack1 | 3.9-SNAPSHOT | 192.168.0.101 | 00000000-0000-0000-0000-000000000000 | 20.1.0 | {'0'} | null
(1 rows)
NOTE: the other boxes are configured as above but with their respective IP addresses.
I just noticed that using cqlsh from the boxes I can connect to any other box. However, using (Develop applications | YugabyteDB Docs ) client I’m seeing the issue I described. Here is the piece of code from the C# client:
var db = Cluster.Builder()
.AddContactPoint("192.168.0.102")
.Build();
db.Connect("mycluster");
It hangs on .Connect.
However, as I mentioned if I stop the master and tserver process on 192.168.0.103 then I can connect just fine to 192.168.0.102. So it’s not a network issue.
hi @Exocomp
Can you also share the output of this query when connected to 192.168.0.101
select * from system.peers;
And perhaps the same two queries from another node, say:
/opt/yugabyte/yugabyte-1.2.6.0/bin/cqlsh 192.168.0.102
select * from system.local;
select * from system.peers;
Hi @kannan ,
Here is the output of those commands:
administrator@box100:~$ /opt/yugabyte/yugabyte-1.2.6.0/bin/cqlsh 192.168.0.101
Connected to local cluster at 192.168.0.101:9042.
[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> select * from system.peers;
peer | data_center | host_id | preferred_ip | rack | release_version | rpc_address | schema_version | tokens
-----------+-------------+--------------------------------------+--------------+-------+-----------------+-------------+--------------------------------------+-------------------------
192.168.0.103 | datacenter1 | 0e7a6228-ec9f-4ac3-a292-6825afc3bca0 | 192.168.0.103 | rack1 | null | 192.168.0.103 | 00000000-0000-0000-0000-000000000000 | {'0'}
192.168.0.102 | datacenter1 | 29ae68ef-f58e-4c14-ac5a-26084739ae13 | 192.168.0.102 | rack1 | null | 192.168.0.102 | 00000000-0000-0000-0000-000000000000 | {'6148820866244280320'}
(2 rows)
administrator@box101:~$ /opt/yugabyte/yugabyte-1.2.6.0/bin/cqlsh 192.168.0.102
Connected to local cluster at 192.168.0.102:9042.
[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> select * from system.local;
key | bootstrapped | broadcast_address | cluster_name | cql_version | data_center | gossip_generation | host_id | listen_address | native_protocol_version | partitioner | rack | release_version | rpc_address | schema_version | thrift_version | tokens | truncated_at
-------+--------------+-------------------+---------------+-------------+-------------+-------------------+---------+----------------+-------------------------+---------------------------------------------+-------+-----------------+-------------+--------------------------------------+----------------+--------+--------------
local | COMPLETED | 192.168.0.102 | local cluster | 3.4.2 | datacenter1 | 0 | null | 192.168.0.102 | 4 | org.apache.cassandra.dht.Murmur3Partitioner | rack1 | 3.9-SNAPSHOT | 192.168.0.102 | 00000000-0000-0000-0000-000000000000 | 20.1.0 | {'0'} | null
(1 rows)
cqlsh> select * from system.peers;
peer | data_center | host_id | preferred_ip | rack | release_version | rpc_address | schema_version | tokens
-----------+-------------+--------------------------------------+--------------+-------+-----------------+-------------+--------------------------------------+--------------------------
192.168.0.103 | datacenter1 | 0e7a6228-ec9f-4ac3-a292-6825afc3bca0 | 192.168.0.103 | rack1 | null | 192.168.0.103 | 00000000-0000-0000-0000-000000000000 | {'0'}
192.168.0.101 | datacenter1 | 4c256ad5-0eef-4523-a34b-ca1bcd3facbb | 192.168.0.101 | rack1 | null | 192.168.0.101 | 00000000-0000-0000-0000-000000000000 | {'-6149102341220990976'}
(2 rows)
NOTE : I can connect using cqlsh if I log into any box to any other box. The issue is only when using the C# client (Develop applications | YugabyteDB Docs ) where I can only connect to 192.168.0.103 (when all 3 boxes are running) and then when I stop master and tserver on 192.168.0.103 then can connect to 192.168.0.102 (from the client).
@kannan
Is there a way to increase the logging level of yugabyte? The INFO, WARNING, ERROR logs don’t produce anything when I connect to a boxes with the issue.
Also when I can’t connect to the boxes I mentioned, the client CPU spikes like it is stuck internally in a loop or doing heavy operations internally (from a client perspective I just see it stuck with .Connect). So seems like it does connect but doesn’t like what it is receiving from Yugabyte.
hi @Exocomp
Could this be an issue similar to [CSHARP-480] - DataStax (our YCQL C# driver is a fork of the Apache Cassandra driver)?
Could you try to enable tracing on as recommended here, and see if we can learn anything from the logs:
https://datastax-oss.atlassian.net/browse/CSHARP-480?focusedCommentId=32400&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-32400
regards,
Kannan
hi @Exocomp
I think we have a handle on this … and it something specific to YugaByte and C# driver combo. Will keep you posted as soon as we have the fix. Hoping in the next few days.
regards,
Kannan
kannan
June 1, 2019, 12:44am
10
kannan
June 1, 2019, 8:32pm
11
@Exocomp
We have done both a YugaByte C# driver side fix (GitHub - yugabyte/cassandra-csharp-driver: YugaByte C# Driver for YugaByte DB's Cassandra-compatible YCQL API ) and a server-side fix (system.local always returns 0 token · Issue #1467 · yugabyte/yugabyte-db · GitHub ).
Either of the fix (using the new driver or waiting for the release with the above server-side fix) should help avoid the problem.
Can you please give this a try with the 3.7.1 version of the YugaByte C# Driver (NuGet Gallery | YugaByteCassandraCSharpDriver 3.16.3 )?
regards,
Kannan
@kannan
I tried 3.7.1, that resolved the issue.
Looking over the commit that fixed it, looks like it is bypassing the token map generation so sounds like before it was going somewhere before where it should not have. Merge pull request #6 from spolitov/token_map · yugabyte/cassandra-csharp-driver@6fc2772 · GitHub
Thanks for the quick fix and glad I could contribute.