Cassandra to YSQL

marknefedov · January 8, 2026, 9:28pm

I know there is a YCQL interface, but I’m more interested in bringing all data together and potentially simplify querying, since currently it is split between Postgres and Cassandra.
What is guideline or best practices for data modeling in YSQL for write heavy workloads?
In Cassandra we can write to different partitions that will hit different servers is there something similar in YSQL or I will need to create separate tables?

Example Cassandra model:

CREATE TABLE IF NOT EXISTS chat_messages (
    id UUID, --- UUIDv7 time ordered,
    chat_id UUID,
    bucket_date DATE,
    shard_id smallint,
    author_id UUID,
    created TIMESTAMP,
    content TEXT,
    PRIMARY KEY ((chat_id, bucket_date, shard_id), id)
) WITH CLUSTERING ORDER BY (id DESC)
    AND compaction = {'class': 'UnifiedCompactionStrategy'};

Also can you point me to YCQL / Cassandra / Scylla comparison if you have one?

nasi_yb · January 9, 2026, 6:36am

Hi marknefedov,

Great question! this is a common pattern when moving from Cassandra to YSQL.

Short answer

You don’t need separate tables or manual shard management in YSQL. Data distribution is automatic, and the main lever you control is primary key design.

Write-heavy data modeling in YSQL

A few practical guidelines:

Primary key drives distribution
Use hash-distributed primary keys to spread writes evenly and avoid hotspots.
No manual sharding needed
YugabyteDB automatically splits tables into tablets and distributes them across nodes. You don’t decide which server a write hits.
UUIDs are a good fit
Time-ordered UUIDs (like UUIDv7) work well for high-concurrency writes when combined with hashing.
Partitioning is optional
PostgreSQL-style partitioning (e.g., by date) is mainly for data lifecycle management, not for scale.

Mapping your Cassandra model

Your Cassandra partition key:

(chat_id, bucket_date, shard_id)

maps cleanly to YSQL like this:

CREATE TABLE chat_messages (
  id UUID,
  chat_id UUID,
  bucket_date DATE,
  shard_id smallint,
  author_id UUID,
  created TIMESTAMP,
  content TEXT,
  PRIMARY KEY ((chat_id, bucket_date, shard_id) HASH, id DESC)
);

HASH provides even write distribution (similar to Cassandra partitions)
id DESC keeps recent messages clustered for fast reads

In many cases, shard_id isn’t strictly required, but keeping it is fine if you already rely on it for write fan-out.

YCQL / Cassandra / Scylla

Useful references:

YCQL vs Cassandra overview:
https://docs.yugabyte.com/stable/faq/comparisons/cassandra/
YCQL best practices:
https://docs.yugabyte.com/stable/develop/best-practices-develop/best-practices-ycql/

High level: YCQL feels like Cassandra, but YugabyteDB adds strong consistency and distributed transactions, and YSQL lets you consolidate workloads that would otherwise need both Postgres and Cassandra.

dorian_yugabyte · January 9, 2026, 7:06am

@marknefedov

I also don’t understand why shard_id is needed here when you already have (chat_id, bucket_date) on the partition key.

dorian_yugabyte · January 9, 2026, 8:25am

Why was bucket_date added here?
Meaning, is this a personal chat between 2 users?
Or like a discord chat channel that can grow into gigabytes?
What’s the expected max size of all rows of a chat over it’s lifetime (or, say, per-month)?

The difference is that partitions in cassandra are limited in size, like 100MB, while tablets in YugabyteDB can grow much larger, 10-100GB+.

marknefedov · January 9, 2026, 9:01am

Yes, these are chats with unbound count of users and additional shard_id is used to spread load during write spikes / increased activity. It also helps with “hot partitions” during reads compared to increasing time bucketing frequency, app requests data for all shards and combines it in memory. Also kind of easier to re-balance partitions in the background via history rewrites.

marknefedov · January 9, 2026, 9:56am

Main load is keyset pagination queries and maintaining all supporting tables, like bucket index for dormant chats.

dorian_yugabyte · January 9, 2026, 10:14am

Keyset pagination is fast though.

Compared to cassandra, you might need 1/10 - 1/100 fewer partitions since we handle them more efficiently.

Like, how many writes/second does a chat channel really have? 10/s? It’s not like it will get 1K+/s?

The same for reads. Do you really need that low latency or so much throughput to read channel history that you need to query multiple servers?

You might need to or not. Depends on your exact scale.

You can optimize further by using “twitter snowflake ids” instead of UUIDs everywhere (12b → 8b).

marknefedov · January 9, 2026, 7:01pm

Largest spike was up to 2100-ish per second , but that were more of a disaster situation and accumulation of bad technical decisions.
Reads are way more controlled and tame.

Topic		Replies	Views
How to verify, if database is Cassandra DB or Yugabyte DB? General	5	991	February 19, 2022
Facing strange behaviour while checking ingestion rate in ysql table General	11	734	September 6, 2023
Does yugabyte ensure foreign key constraint? General	6	876	November 14, 2021
YCQL Geo-Partitioning of Data General	4	688	December 14, 2022
Does Yugabyte Separate Cassandara and PostreSQL implementations? General	1	728	August 6, 2020

Cassandra to YSQL

Short answer

Write-heavy data modeling in YSQL

Mapping your Cassandra model

YCQL / Cassandra / Scylla

Related topics