Using Yugabytes with Lakefs 1.7x

wazza7777 · April 30, 2026, 5:29am

Hi

I have a SQL & Postgres DBA background, and our client is looking at using latest version of yugabytes for a back end for lakefs. I’d welcome expert thoughts on how feasible this is please?

I was wondering about is whether the distributed nature of yugabytes may introduce potential latencies in writes that might cause lakefs to have a bit of a tantrum. I’m not sure whether there is a way to buffer writes that means all transactions as 100% ACID in nature.

I’ve done some reading on both lakefs and yugabytes but a lot of this may come down to how a yb cluster behaves. I have a docker based set up of a single node of lakefs and single node of yb currently running, to have a play. In Prod we would look at installing yb on linux VMs ( not dockers ).

The plan seems to be having 2 yb nodes in one datacentre, 2 nodes in another with a fast link between them.

I’m also realistic in that not every technical solution may be optimal for a proposed use.

We would be counting on at least 100,000 writes per day to lakefs and the yb database.

Thoughts welcome.

dorian_yugabyte · April 30, 2026, 6:22am

Hi @wazza7777

YugabyteDB is a perfect store for metadata storage of a distributed filesystem.

The latency may come from multi datacenter deployment, but lakefs shouldn’t have a tantrum, I’d consider that a bug.

Very low usage, should be fine in all cases.

What is the exact reason behind this type of deployment? Do you need ability to lose a datacenter and continue to function? Or have 1 DC as a disaster recovery only?

Minimum is 3 nodes, see Deployment checklist for YugabyteDB clusters | YugabyteDB Docs .

For 2 datacenters, you can have only asynchronous replication between 2 clusters, so 6 nodes minimum: xCluster deployments | YugabyteDB Docs

For synchronous replication of multi datacenter, you need 3 datacenters, minimum 3 nodes: Multi-DC deployments | YugabyteDB Docs

wazza7777 · April 30, 2026, 10:30pm

Thanks for your reply.

While I read up on your responses and look through the links provided, the design aim is to have the database synchronized across at least 2 ( or more ) datacentres with synchronous replication between datacentres, such that we can write from Lakefs into the database at any datacentre, and the data made immediately available in all datacentres. The idea is to have a single resilient database for LakeFS to use under all conditions.

The system needs to survive the loss of at least one datacentre, ideally 2, so we can fall back and run on a single datacentre and maintain BAU in that scenario without manual intervention ( if possible ).

It seems initially we would need a 3 node cluster at each of our 3 datacentres to survive loss of 1 datacentre, and we would need 5 datacentres set up each with a 3 node cluster to survive loss of 2 datacentres, is that correct?

What sort of latency penalty is there the more datacentres you run in please? Is it possible to calculate it?

dorian_yugabyte · May 1, 2026, 4:35am

You need minimum 3 DCs.

You need minimum 5 DCs.

Minimum is 3 total nodes, one at each DC, not 9.

5 DCs, each with 1 node. The 3-nodes-for-each-DC-minimum is when you want async-replication between 2 separate clusters.

While for synchronous replication, it’s only in-cluster.

Writes need to get ack from majority of replicas before returning to client. 3DCs has 2 majority, 5DCs has 3 majority.

Topic		Replies	Views
Yugabyte and Zabbix or Nagios Design Discussions	5	5774	January 4, 2023
Which one has better write performance between YugabyteDB cluster and single node General	5	4355	March 13, 2023
What multi-region deployment options does YugaByte DB support? General	1	3239	January 9, 2020
Introducing YugaByte DB General	1	3708	September 15, 2018
Yugabyte vs MySQL General	13	8458	October 2, 2019

Using Yugabytes with Lakefs 1.7x

Related topics