How does CQL client know the tablet leader, for any query?

yugabyte cluster has 2 regions, 3 AZs, 6 node architecture.

4 nodes in central region,
2 nodes in east region

Node1 (Master, TServer) US-east
Node2 (Master, TServer) US-central-1
Node3 (TServer) US-central-1
Node4 (TServer) US-east
Node5 (Master, TServer) US-central-2 (Leader)
Node6 (TServer) US-central-2

Application is running in central region.

Application is using YCQL driver(yugabyte gocql client) that is currently configured to send SQL queries to Node2(only)


As mentioned here:
In many cases, this forwarding will be purely local, because both CQL and Redis Cluster clients are capable of sending requests to the right server and avoiding an additional network hop.

  1. Is the above statement about CQL client referring to yugabyte gocql client? here it mentions: “The driver can route queries to nodes that hold data replicas based on partition key (preferring local DC).”

    How can a driver know, which tablet server to send the request?

  2. if yes, does connection configuration of YCQL driver having connections with all 4 nodes(in central region), makes the client driver capable of knowing the correct tablet server, to send query? improving the query response time

  3. if yes, Hoes YCQL driver know, which is the right tablet server to send query request(INSERT/UPDATE/SELECT)?

Note that Leader here is, I’m guessing, for yb-master-leader (where there can be only 1 node).

For normal tables, the leader is picked per-tablet.

Please paste the code that you used to initialize the client? Just to be sure.

The driver periodically queries this table:

ycqlsh:system> select * from system.partitions;

Where it finds how tables are split, and where the tablets are located.

When you send a query, you pass the partition-keys in a way the driver understands them and hashes them and knows where to send them.

Yes, it queries:

ycqlsh:system> select * from system.peers;

to find the location of all servers. It’s best to provide as many ips on connection time so they it can fallback at first (if node2 is down the driver can’t find out about the other nodes).

See my answer above. When in doubt, you can read the source code of the driver or read the docs:

@dorian_yugabyte
For your point: “Note that Leader here is, I’m guessing, for yb-master-leader (where there can be only 1 node).”

Yes you are right

@dorian_yugabyte
I read the documentation here: gocql package - github.com/yugabyte/gocql - Go Packages

To make the client driver capable of knowing the correct tablet server,
Do I need to call gocql.DCAwareRoundRobinPolicy("dc1") or gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("dc1"))?

What does token signify in gocql.TokenAwareHostPolicy()?

@sham_yuga

Depends on the logic you want the client to follow. I’d suggest to be “connect to local DC with token aware. If the node is down, connect to remote DC with token aware”. Agree?

Explained in the code https://github.com/gocql/gocql/blob/558dfae50b5d369de77dae132dbfa64968e3abd4/policies.go#L351

@dorian_yugabyte
1)
When you say: “token aware” , does it mean selection based on partition key?

Upon setting host selection policy with api gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("dc1")),
If the driver observes that specific node(matching partition key) is down, which another node will be picked?

Yes.

It should contact directly the node in another DC that has that partition.

@dorian_yugabyte

Before setting the host selection policy, gocql driver creates a cluster config, as shown in this example:

cluster := gocql.NewCluster("192.168.1.1", "192.168.1.2", "192.168.1.3")

So, In my case we have 6 nodes(4 nodes in Central & 2 nodes in East).

Which node IP needs to passed to gocql.NewCluster()?

If possible, the IP of all nodes.

@dorian_yugabyte
For my 6 node setup(4 nodes in Central & 2 nodes in East),

What is the advantage of having 6 node configuration(
gocql.NewCluster("node1-ip","node2-ip","node3-ip","node4-ip","node5-ip","node6-ip"))
over single node configuration(
gocql.NewCluster("node2-ip-master"))? before setting host selection policy

Nodes can go down. When you start (or restart) your client service, it will try to connect to 1 node, it will be down, and won’t be able to discover the other nodes. It will think the whole cluster is down.

1 Like

@dorian_yugabyte
For your point: “Nodes can go down. When you start (or restart) your client service, it will try to connect to 1 node, it will be down, and won’t be able to discover the other nodes.”

For discovery purpose, does every node have information about all other nodes? Because in our setup, node3, node4 & node5 are only YB-TServer’s but not YB-Master.

As per the documentation, It is YB-Master Service the keeper of system metadata but not YB-TServer

where system-metadata is:

select * from system.partitions;
select * from system.peers;

So, does it make sense to have this gocql api called as,
gocql.NewCluster("node1-ip","node2-ip","node6-ip")
instead of
gocql.NewCluster("node1-ip","node2-ip","node3-ip","node4-ip","node5-ip","node6-ip")
?

Yes, all nodes know about each other and all tablet locations.

Use: gocql.NewCluster("as many yb-tserver ips as possible")

1 Like