YugaByte DB Community Forum

YCSB Benchmark Results for YugaByte and Apache Cassandra


#1

UPDATE:
Please see the more updated YCSB post published on Nov’ 2017.

======

YugaByte is API compatible with Apache Cassandra (CQL). We often get asked how the performance of YugaByte compares to Cassandra. We did just that with the YCSB benchmark, and we want to share those results.

Setup

The machine setup for YugaByte and Cassandra were identical, and we picked the consistency levels such that both the databases have strong consistency. Here is the basic setup for each of these clusters:

  • 3 node cluster in Google Compute (GCP)
  • Each node is a n1-standard-16
  • 16 vcpu’s
    • Intel® Xeon® CPU @ 2.20GHz CPUs
    • 60GB RAM
    • 2 x 375 GB direct attached SSD
  • Replication factor = 3
  • YugaByte: default configuration parameters were used.
  • Apache Cassandra: 3.11.1 release. Default configuration parameters were used except MAX_HEAP_SIZE increased to 30GB and HEAP_NEWSIZE to 1600MB to utilize physical memory available.
  • For writes:
    • YugaByte uses Raft (a distributed consensus) protocol for strong consistency.
    • With Apache Cassandra, quorum write setting was used.
  • For reads:
    • YugaByte reads default to strongly consistent reads.
    • With Apache Cassandra, quorum read setting was used.

The YCSB client software was built from a fork of the open source repository. We enhanced YCSB’s Cassandra binding to use “prepared” statements (GitHub commit a87fa6eac48f4b97de6f2976d320ce07ddd1fa35). This allows repeated execution of statements with different bind values each time and statement compilation cost is incurred only once. Additionally, this enables the client to directly go to the correct node which has the data for the row in question. We have submitted our enhancement to YCSB for consideration, and this enhancement benefits both YugaByte and Apache Cassandra.

Results

The benchmark was done for each system with 1 million, 5 million and 10 million keys. The number of keys is a proxy for the density of data in the system.

We noticed that YugaByte outperformed Apache Cassandra by increasing margins as the number of keys (data density) increased. For example, in the 100% read workload - YugaByte showed about 42% better performance than Apache Cassandra with 1 million keys, and 95% better performance than Apache Cassandra with 10M keys.

Here is a graphical result comparing the results of running YCSB with 10 million keys on YugaByte and Apache Cassandra.

Below is a detailed table on the relative performance for all the tests we ran.

YCSB on GCP cluster / Overall Throughput (ops/sec):

Key Count <— 1,000,000 —>
Workload Description Apache Cassandra YugaByte Relative Performance
A 50% Read / 50% Write 74,710 90,520 1.21x
B 95% Read / 5% Write 70,607 95,698 1.36x
C 100% Read 72,873 103,484 1.42x
D Read Latest Inserted 80,465 96,005 1.19x
E Short Range Scan 8,377 8,855 1.06x
F Read Modify Write 43,847 61,257 1.40x

Key Count <— 5,000,000 —>
Workload Description Apache Cassandra YugaByte Relative Performance
A 50% Read / 50% Write 66,495 81,896 1.23x
B 95% Read / 5% Write 51,976 83,135 1.60x
C 100% Read 48,028 88,923 1.85x
D Read Latest Inserted 80,306 91,414 1.14x
E Short Range Scan 6,691 9,163 1.37x
F Read Modify Write 40,819 55,006 1.35x

Key Count <— 10,000,000 —>
Workload Description Apache Cassandra YugaByte Relative Performance
A 50% Read / 50% Write 53,615 76,549 1.43x
B 95% Read / 5% Write 45,912 81,751 1.78x
C 100% Read 45,461 88,607 1.95x
D Read Latest Inserted 69,159 88,603 1.28x
E Short Range Scan 5,562 9,162 1.65x
F Read Modify Write 32,570 52,403 1.61x

Validating YugaByte DB with Jepsen