Comparision between your raft implementation and etcd

Have you made any comparison between your raft implementation and etcd?
There are many engineering optimizations within etcd raft, such as batching and pipelining, which could bring high benefits to overall throughput. For example, given the pipelining tricks, the state replication machine could still perform well across multiple datacenters. Does there exist such optimizations within yugabyte’s raft? And have you perform any benchmark just on the raft component?

Hi @yingfeng,

No, we have not compared our RAFT implementation with that of etcd. We have implemented a number of the batching and pipelining optimizations and tricks - and are seeing excellent performance in our write heavy tests (all our writes go through our RAFT layer as we are strongly consistent). We have done extensive end-to-end benchmarking on the end to end system - which is the query layer plus RAFT (the RAFT layer alone should perform even better).

As example numbers in pure write workloads, we are able to see the following numbers on a universe with 3 machines (each having 16 cores) with a replication factor of 3:

  • over 60K writes/sec for small key-value writes
  • over 300K writes/sec for batched small key-value writes

For example, given the pipelining tricks, the state replication machine could still perform well across multiple datacenters.

We have tested our RAFT implementation in multi-DC scenarios. Below is a screenshot of a universe running replication factor of 5 with RAFT members spread across us-west, us-east and asia (screenshot from our enterprise edition console running on AWS).

HI, @karthik

The benchmark results of batching seems good. Is this result got from single RAFT group, or multi RAFT groups(In Yugabyte’s terminology, it seems to be tablet since each tablet is a single RAFT group)?

We have a similar multi-raft based no-sql storage, the raft implementation is got from etcd,if there are enough raft groups, given 3 nodes with 32 cpu cores each, we’ve got 500K write operations per second and the bottleneck is the network card(only 1 Gigabytes is available), so we don’t know the practical upper bound right now. As a result, I’m very interested in the performance of this RAFT implementation.