Multiple concurrent upserts: Is YugabyteDB suitable for my use case?

ygc · June 16, 2026, 6:35pm

We use YugabyteDB (YSQL) as a sink for a high-throughput event stream. Each event describes exactly one entity.
YugabyteDB stores one summary row per entity and we continuously merge incoming events into it. The same entity can get many events in a short burst, and those events are processed concurrently by multiple workers, so we very often have several concurrent upserts hitting the same row.

Simplified schema:

CREATE TABLE device_state (
    device_id     TEXT PRIMARY KEY,
    first_seen_at TIMESTAMPTZ      NOT NULL,
    last_seen_at  TIMESTAMPTZ      NOT NULL,
    max_temp      DOUBLE PRECISION NOT NULL,
    updated_at    TIMESTAMPTZ      NOT NULL
);

Each incoming reading is an upsert: Insert if new, otherwise merge based on some logic.

INSERT INTO device_state (device_id, first_seen_at, last_seen_at, max_temp, updated_at)
VALUES ($1, $2, $3, $4, $5)
ON CONFLICT (device_id) DO UPDATE SET
    first_seen_at = LEAST(device_state.first_seen_at, EXCLUDED.first_seen_at),
    last_seen_at  = GREATEST(device_state.last_seen_at, EXCLUDED.last_seen_at),
    max_temp      = GREATEST(device_state.max_temp, EXCLUDED.max_temp),
    updated_at    = EXCLUDED.updated_at;

(The actual ON CONFLICT logic has a few more fields and is a bit more complicated.)

The problem: Under bursts, many of these run concurrently against the same device_id. We get a high rate of SQLSTATE 40001 (serialization failures), exhaust the internal retry limit (yb_max_query_layer_retries=60) and the upsert ultimately fails.

Setup:

YSQL, default isolation level, RF=3, YugabyteDB 2025.2.
Generally hundreds of events per second.
The rate of “problematic events” (that is, a burst of events targeting the same rows) is about 5-10 events per second, continuing at this rate for a few minutes.
Every worker can receive any device_id (that is, there’s no partitioning). I haven’t tried batching them in the application logic before the upsert because that feels more like a band-aid.
We’d prefer a database-level solution and are treating app-side per-key routing/coalescing as a fallback, partly out of concern about hot-key skew and future scaling.

Things I tried include:

Both READ COMMITTED and REPEATABLE READ isolation levels.
Advisory locks (this turned out to be the slowest of my experiments).
Calling UPDATE instead of INSERT... ON CONFLICT when I know for sure that the row for a specific entity already exists.

Questions:

Is a high-contention, single-row upsert workload like this a reasonable fit for YugabyteDB, or are we working against the design?
What’s the recommended way to avoid the upsert failures here?
Is there any server-side configuration that meaningfully helps with hot-key write contention?

Thank you very much.

dorian_yugabyte · June 17, 2026, 8:33am

Hi @ygc

Can you partition this way by the unique row that you end up upserting in PostgreSQL and then try batching in the app?

I think this will help because you will lessen the db work if you do per-key + batch at the same time.

It is not a good fit for the replication semantics that we apply (synchronous replication). It gets worse with multi region for example because of the added latency.

Don’t do as many concurrent writes to the same keys. Either batch them in the app or your queue layer or change the model so you don’t have to do upserts (like incremental aggregations over time

Please explain in full?

Things that can help:

Enable read committed and confirm you’re using it Read Committed isolation level | YugabyteDB Docs
Enable wait queues Concurrency control | YugabyteDB Docs
Keep the transaction in AUTOCOMMIT mode and don’t do RETURNING
Get the exact error message it’s sending on the 40001 error code?

ygc · June 18, 2026, 12:46pm

Thank you very much for your response!

I can partition in Kafka, but then we’ll have “hot device IDs” which get hundreds of reading per minute, while others get close to nothing. This is why we avoid it.

I can also batch them in the app, but since we have 100+ workers (and perhaps more in the future), then when bad luck strikes I’ll probably once again receive 40001s.

It’s not very different from what I provided. The main addition in the real code is that we have some “enum” fields for which the logic looks as follows:

status = CASE
    WHEN (device_state.status = 'warning' AND EXCLUDED.status = 'critical')
    OR (device_state.status = 'ok' AND EXCLUDED.status IN ('warning', 'critical'))
    THEN EXCLUDED.status          -- new status is strictly higher → escalate
    ELSE device_state.status      -- otherwise keep the current (higher-or-equal) status
    END

We enabled it and confirmed it’s enabled.
Also enabled.
The transaction was a single INSERT... ON CONFLICT statement, so as far as I understand it had AUTOCOMMIT.
err.message: "could not serialize access due to concurrent update (yb_max_query_layer_retries set to 60 are exhausted)"

err.details: "conflict with concurrently committed data. Value write after transaction start: doc ht ({ physical: 1781778264621386 }) > read time ({ physical: 1781778264619006 }), key: SubDocKey(<redacted>): kConflict [read committed]"

dorian_yugabyte · June 18, 2026, 8:49pm

Hundreds per minute is, like, 10/second, so this is inline with kafka, and it’s better to handle it here compared to the other level in yugabyte. It’s a normal pattern to do this group by device in kafka and batch-stream into the next layer. It should work fine for 10x this traffic too.

If 100x or 1000x, you can probably shard in kafka to send this device to, say, 10 partitions.

I think you do the batching in kafka and then do de-duplication in the app.

This might also be an anti-pattern, depends what you’re exactly doing and what language it is. If it’s a language that supports multiple cores (java,c#,rust,etc) it’s better to have fewer fat workers. They’ll be able to re-use connections better and probably less contention too. But it’s just in general.

AUTOCOMMIT needs to be enabled, depends on your driver. Can you make sure?

Topic		Replies	Views
What is the most performant way of upsert? General	6	2139	January 15, 2025
No of connections are in a single node General	14	5441	July 5, 2024
Are upsert-transactions possible with yugabyteDB? Reading a value, and based on the value doing writes. Addressing double spending problem General	1	1718	April 3, 2022
Frequent timeout/connection losing General	18	10339	December 1, 2021
Insert ERROR in YSQLSH General	3	2950	February 7, 2024

Multiple concurrent upserts: Is YugabyteDB suitable for my use case?

Related topics