I have a SQL & Postgres DBA background, and our client is looking at using latest version of yugabytes for a back end for lakefs. I’d welcome expert thoughts on how feasible this is please?
I was wondering about is whether the distributed nature of yugabytes may introduce potential latencies in writes that might cause lakefs to have a bit of a tantrum. I’m not sure whether there is a way to buffer writes that means all transactions as 100% ACID in nature.
I’ve done some reading on both lakefs and yugabytes but a lot of this may come down to how a yb cluster behaves. I have a docker based set up of a single node of lakefs and single node of yb currently running, to have a play. In Prod we would look at installing yb on linux VMs ( not dockers ).
The plan seems to be having 2 yb nodes in one datacentre, 2 nodes in another with a fast link between them.
I’m also realistic in that not every technical solution may be optimal for a proposed use.
We would be counting on at least 100,000 writes per day to lakefs and the yb database.
YugabyteDB is a perfect store for metadata storage of a distributed filesystem.
The latency may come from multi datacenter deployment, but lakefs shouldn’t have a tantrum, I’d consider that a bug.
Very low usage, should be fine in all cases.
What is the exact reason behind this type of deployment? Do you need ability to lose a datacenter and continue to function? Or have 1 DC as a disaster recovery only?
While I read up on your responses and look through the links provided, the design aim is to have the database synchronized across at least 2 ( or more ) datacentres with synchronous replication between datacentres, such that we can write from Lakefs into the database at any datacentre, and the data made immediately available in all datacentres. The idea is to have a single resilient database for LakeFS to use under all conditions.
The system needs to survive the loss of at least one datacentre, ideally 2, so we can fall back and run on a single datacentre and maintain BAU in that scenario without manual intervention ( if possible ).
It seems initially we would need a 3 node cluster at each of our 3 datacentres to survive loss of 1 datacentre, and we would need 5 datacentres set up each with a 3 node cluster to survive loss of 2 datacentres, is that correct?
What sort of latency penalty is there the more datacentres you run in please? Is it possible to calculate it?