I know there is a YCQL interface, but I’m more interested in bringing all data together and potentially simplify querying, since currently it is split between Postgres and Cassandra.
What is guideline or best practices for data modeling in YSQL for write heavy workloads?
In Cassandra we can write to different partitions that will hit different servers is there something similar in YSQL or I will need to create separate tables?
Example Cassandra model:
CREATE TABLE IF NOT EXISTS chat_messages (
id UUID, --- UUIDv7 time ordered,
chat_id UUID,
bucket_date DATE,
shard_id smallint,
author_id UUID,
created TIMESTAMP,
content TEXT,
PRIMARY KEY ((chat_id, bucket_date, shard_id), id)
) WITH CLUSTERING ORDER BY (id DESC)
AND compaction = {'class': 'UnifiedCompactionStrategy'};
Also can you point me to YCQL / Cassandra / Scylla comparison if you have one?
Great question! this is a common pattern when moving from Cassandra to YSQL.
Short answer
You don’t need separate tables or manual shard management in YSQL. Data distribution is automatic, and the main lever you control is primary key design.
Write-heavy data modeling in YSQL
A few practical guidelines:
Primary key drives distribution
Use hash-distributed primary keys to spread writes evenly and avoid hotspots.
No manual sharding needed
YugabyteDB automatically splits tables into tablets and distributes them across nodes. You don’t decide which server a write hits.
UUIDs are a good fit
Time-ordered UUIDs (like UUIDv7) work well for high-concurrency writes when combined with hashing.
Partitioning is optional
PostgreSQL-style partitioning (e.g., by date) is mainly for data lifecycle management, not for scale.
Mapping your Cassandra model
Your Cassandra partition key:
(chat_id, bucket_date, shard_id)
maps cleanly to YSQL like this:
CREATE TABLE chat_messages (
id UUID,
chat_id UUID,
bucket_date DATE,
shard_id smallint,
author_id UUID,
created TIMESTAMP,
content TEXT,
PRIMARY KEY ((chat_id, bucket_date, shard_id) HASH, id DESC)
);
HASH provides even write distribution (similar to Cassandra partitions)
id DESC keeps recent messages clustered for fast reads
In many cases, shard_id isn’t strictly required, but keeping it is fine if you already rely on it for write fan-out.
High level: YCQL feels like Cassandra, but YugabyteDB adds strong consistency and distributed transactions, and YSQL lets you consolidate workloads that would otherwise need both Postgres and Cassandra.
Why was bucket_date added here?
Meaning, is this a personal chat between 2 users?
Or like a discord chat channel that can grow into gigabytes?
What’s the expected max size of all rows of a chat over it’s lifetime (or, say, per-month)?
The difference is that partitions in cassandra are limited in size, like 100MB, while tablets in YugabyteDB can grow much larger, 10-100GB+.
Yes, these are chats with unbound count of users and additional shard_id is used to spread load during write spikes / increased activity. It also helps with “hot partitions” during reads compared to increasing time bucketing frequency, app requests data for all shards and combines it in memory. Also kind of easier to re-balance partitions in the background via history rewrites.
Largest spike was up to 2100-ish per second , but that were more of a disaster situation and accumulation of bad technical decisions.
Reads are way more controlled and tame.