I am coming from a microservice world where each microservice has a dedicated instance of PostgreSql server. For example a “product” microservice has a dedicated instance of PostgreSql server, and an “account” microservice as a dedicated instance of PostgreSql. Etc…
We have about 40 different microservices with such design.
This design was put in place to isolate each microservice data. And also to avoid a centralized but potentially slow PostgreSql server in case of one or many microservice running resource intensive queries impacting others microservices.
My question is about your recommendation if I decide to migrate from PostgreSql to YugaByte DB.
Should I have one centralized installation of YugaByte DB server with one database for each microservice?
OR is the architecture that I followed with PostgreSql possible/recommendable with YugaByte DB?
We have 40 instances of PostgreSql with about 30 GB of data in total. Some instances have less than 50MB of data when some have about 1GB. They are stored on SSD disks.
Some instances run with 256MB RAM when some others run with 2GB. In total I would say 20GB of RAM are used.
The CPU usage is usually low. But we have some increases to 20% CPU for some instances for short times based on specific events.
Also note that some microservices are caching heavily and consequently the load on PostgreSql is lower than the demand. If we move to YugaByte DB, we are planning to remove the caching logics from our microservices.
The services run on Kubernetes on 2 data-centers per region. Each data-center has 12 nodes. On each region, the primary data-center hosts the master PostgreSql instances when the secondary data-center hosts the standby/slave PostgreSql instances.
Usually you try to architect based on current needs + estimated (real estimation, not google scale) growth. Even 100x growth, 1 cluster will be able to support your needs.
We’ll help with schema/queries design so each table/db will be able to scalable.
This will also depend what you’re exactly caching. If it’s single rows, then yes it will work great. While if it’s complex computations (cpu intensive), then you may still need to cache (and even cache in yugabytedb).
It would be better to have 3 regions when replication factor is 3. Loosing any region the cluster will still be live. While having 2 regions it will not because it can’t form a majority consensus. (see the link above)
If a cached query is CPU intensive, the caching can be moved from the services to Yugabyte DB as long as it is cached using the YEDIS caching API,
If a cached query is not CPU intensive and returns single rows, then YugaByte DB can handle it via its YSQL API,
to benefit from the multi-clusters capability of YugaByte DB, I must have at least 3 clusters as required by the RAFT consensus algo
I should use one unique installation of YugaByte DB for all microservices and I can create one database for each microservice
In regards to the last point, I would say that for our company it is very convenient for each microservice to have their specific instances of PostgreSql. Developers owning a microservice are able to upgrade a version of PostgreSql without impacting other services. It seems like with YugaByte DB it won’t be possible since we will be using a centralised installation of it.
Saying that, the rolling upgrade capability of YugaByte DB with Kubernetes seems to be a reliable. But I am not sure if we are comfortable (yet) with having to upgrade a centralised system for 40 microservices.
YEDIS is not a caching API – it is fully-distributed, strongly-consistent, persistent key-value database that simply happens to speak the Redis wire protocol (and has limited support for the Redis command library). It is not under active development and hence we do not recommend for new use cases YEDIS | YugabyteDB Docs
And nothing stops you from going to a fully-decentralized database architecture where each microservice has its own YugabyteDB cluster. You simply have to account for the additional administration overhead given that you are no longer running a single-node PostgreSQL but rather running a minimum 3-node distributed database.