How is data stored internally in YugaByte

sliu · March 25, 2019, 9:49pm

I have a question regarding the internal format of YugaByteDB, comparing with Cassandra.
Take this table in KairosDB for example:

CREATE TABLE IF NOT EXISTS row_key_time_index (
    metric text,
    table_name text,
    row_time timestamp,
    value text,
    PRIMARY KEY ((metric), table_name, row_time)
);

In Cassandra, if I understand it correctly. I would expect the partition key metric only appear once in an sstable after compaction. Is this also true for YugaByteDB?

kannan · March 25, 2019, 10:32pm

Thanks for your question @sliu.

A high-level overview of YugaByte DB’s storage format is explained here: Persistence in YugabyteDB | YugabyteDB Docs

We have built this on top of on RocksDB (a log-structured key-to-value storage engine), and extended RocksDB to efficiently support a document/row storage model. [See also: How We Built a High Performance Document Store on RocksDB? | Yugabyte]

With regards to your question, for rows that share the same prefix of the primary key (such as metric in your example), logically, that data is repeated on every row. However, YugaByte DB uses a two-level compression scheme.

The block cache format is “prefix compressed”. So rows with a common prefix don’t actually incur much space overhead in memory.
Additionally, on-disk, in SSTable files, the blocks are stored after Snappy compression.

You mentioned about Apache Cassandra << In Cassandra, if I understand it correctly. I would expect the partition key metric only appear once in an sstable after compaction.>>

You are probably correct. I am not 100% sure. I think this design causes Apache Cassandra issues when a single partition key has lots and lots of rows. Likely, the entire partition needs to be brought into memory in a all-or-nothing manner making memory usage inefficient and also causing GC issues (because of Java implementation).

With YugaByte DB, a single partition key can have lots of lots of rows that can span several database blocks. Not all of them need to be even brought into memory if the query is interested only in a slice (e.g. time range 5 to 10pm). Only the blocks with the matching time range need to be brought into memory.

Topic		Replies	Views
Kv store policy General	2	29	January 10, 2025
YCSB Benchmark Results for YugaByte and Apache Cassandra General	1	13902	April 17, 2021
What is the purpose of SST in yugabyte? General	5	2124	March 3, 2022
How to check the per-row on disk space usage in YugabyteDB? General	1	1074	June 25, 2020
YCSB Benchmark Results for YugaByte and Apache Cassandra (again) with P99 Latencies General	1	6830	March 8, 2023

How is data stored internally in YugaByte

Related topics