Hi
When writing data, the leader will write wal and memtable. Do followers need to write wal and memtable? Or just write wal?
From what this Yugabyte Docs AI answered, it is necessary to write memtable and wal
Hi @ZhenNan2016
They need to write memtable so they can be online faster on failover and be able to do follower reads.
WAL needs to be replicated (and written on disk) for persistence.
Hi @dorian_yugabyte
Followers don’t provide services like leader do, they are just used for disaster recovery. Isn’t this a bit of a waste of memory resources?
I read the following document :Raft | YugabyteDB Docs ,which describes “Replication of the write operation”:
From this it looks like no introduction will write memtable ?
You wrote << they are just used for disaster recovery. Isn’t this a bit of a waste of memory resources? >>
The followers are like hot-standbys ready to take over as “new leader” in matter of seconds, and not just for disaster recovery, but say a single node outage as well
Also, a follower can take over even sooner than seconds in case of controlled operations like rolling software upgrade of the nodes in a cluster or shard movement between nodes for load-balancing activities.
A follower’s state machine undergoes the same life-cycle for data (as its leader)… the data first comes into WAL+memtable… and then gets flushed to SSTables in the background, which then periodically gets compacted etc. Without those operations being done on an on-going basis the follower’s ability to take over and serve reads efficiently or reclaim space for deleted records etc. will be affected.
Imagine if followers only wrote to the Write-Ahead Log (WAL). It would keep growing and take a long time to recover when a follower needs to become a leader. To update the database state, the data first goes to a MemTable where it’s sorted by key before being flushed to SST files. This is not a waste, and has a huge advantage: it turns random writes into sequential writes, making disk IO more efficient.
The same applies to traditional databases, where the WAL is applied to shared buffers in memory on a standby database. In YugabyteDB, Raft and LSM Tree make the distribution and replication more efficient: followers only need to acknowledge the write when it’s in the WAL, and then they apply it to the MemTable later when they know the leader has done the same (this info is piggybacked in the next calls or heartbeat)
Hi @kannan @FranckPachot
Thank you for your replies.
Followers are also running, and memtable is taking up memory. Originally, there are N nodes in the cluster, each node is responsible for the Nth data service, now it becomes, each node, in addition to the leader’s data, there are replication of the table or other table’s replication of the data, the node is almost all the table data, even if the replication does not provide write services.
The data volume is very large, it seems that the data storage space consumption is very large, not quite like the design concept of sharded storage.
There are some DB products, the replica is located in the node, just write wal, do not write memtable. maybe this is not a unified standard, mainly depends on each db product’s own definition and planning.
Yes, that’s what “replication” means.
It is the concept of sharded. There is another concept of “shared data” where you may save disk space depending on the scenario (best case would be single AZ).
On failover they will need to read the WAL and construct the memtable before being able to accept writes. They also can’t accept follower reads (or only permit older data to be read).
There are different ways, each with it’s own tradeoffs. Currently in the industry, the tradeoffs that we’ve choosen are the most popular (the reasons are what we mentioned in previous replies, very fast failover, etc)
Okay, thank you for your reply.
@ZhenNan2016 Which DB products are storing only WAL? It must either be applied to the database files (though a standby instance like many databases, or by the storage servers like in Aurora or Neon), or used in combination to frequent database backups so that only the latest WAL needs to be applied. I don’t think any database stores the WAL without applying it, and it needs some buffer in memory to apply it efficiently, MemTable or Shared Buffers.
The original bigtable (read old paper), hypertable & early hbase were like this (later changed to our model, example see “hbase hydrabase”).
I don’t know of any current ones. I know of segment replication in opensearch & quickwit (search index compaction is heavier).
Hi @dorian_yugabyte
Yes, you’r right.
There are currently some products developed based on HBase, and this model is also retained.
@FranckPachot