Why INSERT query is slower than SELECT query?

sham_yuga · January 17, 2022, 1:36pm

We are using yugabyte database with YCQL driver(gocql)

We have below two queries:

selectQueryMessage= "Select col2, col3, col4, col5 from table1 " +
		"WHERE primarykeyCol = ?"

insertQueryMessage = "INSERT INTO table1 (primarykeyCol, col2, col3, col4, col5) VALUES (?, ?, ?, ?, ?) " +
		"IF NOT EXISTS RETURNS STATUS AS ROW"

============

In last 7 days, each SELECT query is taking 10 ms consistently where as each INSERT query takes 30-40 ms

Why INSERT query 3-4 times slower than SELECT?

yogi · January 18, 2022, 5:40am

If your RF > 1, insert operation goes through a distributed transaction cycle where by record is stored on majority of replicas (if RF=3, then 2; if RF=4, then 3, it RF=7, then 5). However in case of SELECT record is read from only one tablet (assuming you are retrieving 1 record).

sham_yuga · January 18, 2022, 6:30am

@yogi

yugaware shows Replication factor as 3

Yes select query retrieves one record

For INSERT query, What is distributed transaction cycle?

yogi · January 18, 2022, 7:36am

See this page it explains how inserts go through distributed transaction.

“Transactional IO path | YugabyteDB Docs” Transactional I/O path | YugabyteDB Docs

sham_yuga · January 19, 2022, 4:30am

@yogi
Can I say,
if RF=3 in our cluster environment is changed to RF=1, then INSERT query would execute with less latency

yogi · January 19, 2022, 5:42am

Yes, in theory it will.

But you compromise on the resiliency. With RF=3, you can sustain 1 node failure, i.e. YB can continue serve query without interruption. But with RF=1, any node failure will mean interruption.

Marko_Rajcevic · January 19, 2022, 8:24pm

Hi Sham,

What is your current cluster configuration? i.e. Single-AZ, Multi-AZ, Multi-Region

Depending on where the nodes of your cluster are located it may cause increased latencies for writes due to networking between nodes (as Yogi mentioned each write has to be acknowledged by 2/3 peers with RF=3). Another aspect to consider if you are in a multi-region setup is where your users are. For example, if you are querying from US-East but the tablet-leader is in US-West you will see increased latencies.

sham_yuga · January 20, 2022, 1:25am

@Marko_Rajcevic
We have 6 node cluster(2 regions 3 AZs)
4 nodes in central region US
2 nodes in eastern region US

Node1 (Master, TServer) US-east
Node2 (Master, TServer) US-central-1
Node3 (TServer) US-central-1
Node4 (TServer) US-east
Node5 (Master, TServer) US-central-2 (Leader)
Node6 (TServer) US-central-2

Application(installed in central region) uses YCL driver and talks to database instance of node 2

What is the difference between Master leader vs TServer vs Master+TServer+LEADER ?
Can a node just be a master but not tserver?
Does INSERT query transaction run on all 6 nodes and then respond to client(Application) unlike SELECT query(which run transaction only on node2)?

dorian_yugabyte · January 20, 2022, 9:03am

There is only 1 master-leader server.

While on tservers, the master is chosen on a per-tablet basis. So it will depend on which tablet-leader the row that you’re trying to select/insert belongs in.

Yes, but you still need the minimum number of tservrs/masters.

The YCQL driver will contact directly to the server which has the tablet-leader for that row (assuming single row insert/select or a single partition), and the server will also replicate to another tserver before returning “success” to the driver.

sham_yuga · January 20, 2022, 4:48pm

@dorian_yugabyte

As mentioned above,
application is running in central region.
and there are 4 nodes(of database cluster) in central region

Does the performance impact depend on the node(out of 4 nodes in central region) that application points to?
if yes, what would be the performance impact for INSERT query?
if no, does performance remain same, if application point to any of the 4 nodes in central region?

dorian_yugabyte · January 21, 2022, 9:47am

Yes, if the tablet-leader for the row you’re select/inserting is residing in the same node.

If the tablet-leader is not in the node you’re connecting to, it will involve 1 extra network hop. (the server you’re connecting will redirect the request to the tablet-leader, and the tablet-leader will replicate it to another server that has the tablet-peer, and return “ok” to the node you’re connecting, and that node returns “ok” to the client).

sham_yuga · January 21, 2022, 5:03pm

@dorian_yugabyte
If there are two tables involved using SELECT query, then how tablet leaders are picked? Because we want to split our single table to two tables that has referential integrity.

dorian_yugabyte · January 22, 2022, 8:40am

Think you send the query to node_x. node_x, assuming a complex query involving 2+ tablets, will identify the location of each tablet_leader on the cluster and query the data on them, like a coordinator.

So it will depend on which tablet_leader each row resides, and where each tablet_leader reside in the cluster.

sham_yuga · January 22, 2022, 9:13am

@dorian_yugabyte
So that means, from the performance perspective, it is not a good idea to have referential integrity constraints across tables

dorian_yugabyte · January 22, 2022, 4:44pm

It is slower on INSERT because the db will check the row exists on the other table (the same with DELETE).

But not on SELECT.

Topic		Replies	Views
Ysql high latency General	5	46	April 7, 2025
Yugabytedb select so slow Design Discussions	6	889	July 9, 2021
Slow insert performance on single-node setup? General	1	29	May 5, 2025
Query performance on table with indices General	10	1341	March 18, 2022
Which one has better write performance between YugabyteDB cluster and single node General	5	993	March 13, 2023

Why INSERT query is slower than SELECT query?

Related topics