Do followers need to write wal and memtable?

ZhenNan2016 · June 28, 2024, 2:00pm

Hi
When writing data, the leader will write wal and memtable. Do followers need to write wal and memtable? Or just write wal?
From what this Yugabyte Docs AI answered, it is necessary to write memtable and wal

dorian_yugabyte · June 28, 2024, 2:10pm

Hi @ZhenNan2016

They need to write memtable so they can be online faster on failover and be able to do follower reads.

WAL needs to be replicated (and written on disk) for persistence.

ZhenNan2016 · June 28, 2024, 3:34pm

Hi @dorian_yugabyte
Followers don’t provide services like leader do, they are just used for disaster recovery. Isn’t this a bit of a waste of memory resources?
I read the following document ：Raft | YugabyteDB Docs ，which describes “Replication of the write operation”:

From this it looks like no introduction will write memtable ？

kannan · June 28, 2024, 3:56pm

You wrote << they are just used for disaster recovery. Isn’t this a bit of a waste of memory resources? >>

The followers are like hot-standbys ready to take over as “new leader” in matter of seconds, and not just for disaster recovery, but say a single node outage as well

Also, a follower can take over even sooner than seconds in case of controlled operations like rolling software upgrade of the nodes in a cluster or shard movement between nodes for load-balancing activities.

A follower’s state machine undergoes the same life-cycle for data (as its leader)… the data first comes into WAL+memtable… and then gets flushed to SSTables in the background, which then periodically gets compacted etc. Without those operations being done on an on-going basis the follower’s ability to take over and serve reads efficiently or reclaim space for deleted records etc. will be affected.

FranckPachot · June 30, 2024, 8:55pm

Imagine if followers only wrote to the Write-Ahead Log (WAL). It would keep growing and take a long time to recover when a follower needs to become a leader. To update the database state, the data first goes to a MemTable where it’s sorted by key before being flushed to SST files. This is not a waste, and has a huge advantage: it turns random writes into sequential writes, making disk IO more efficient.

The same applies to traditional databases, where the WAL is applied to shared buffers in memory on a standby database. In YugabyteDB, Raft and LSM Tree make the distribution and replication more efficient: followers only need to acknowledge the write when it’s in the WAL, and then they apply it to the MemTable later when they know the leader has done the same (this info is piggybacked in the next calls or heartbeat)

ZhenNan2016 · July 1, 2024, 2:28am

Hi @kannan @FranckPachot
Thank you for your replies.
Followers are also running, and memtable is taking up memory. Originally, there are N nodes in the cluster, each node is responsible for the Nth data service, now it becomes, each node, in addition to the leader’s data, there are replication of the table or other table’s replication of the data, the node is almost all the table data, even if the replication does not provide write services.
The data volume is very large, it seems that the data storage space consumption is very large, not quite like the design concept of sharded storage.
There are some DB products, the replica is located in the node, just write wal, do not write memtable. maybe this is not a unified standard, mainly depends on each db product’s own definition and planning.

dorian_yugabyte · July 1, 2024, 7:50am

Yes, that’s what “replication” means.

It is the concept of sharded. There is another concept of “shared data” where you may save disk space depending on the scenario (best case would be single AZ).

On failover they will need to read the WAL and construct the memtable before being able to accept writes. They also can’t accept follower reads (or only permit older data to be read).

There are different ways, each with it’s own tradeoffs. Currently in the industry, the tradeoffs that we’ve choosen are the most popular (the reasons are what we mentioned in previous replies, very fast failover, etc)

ZhenNan2016 · July 1, 2024, 10:10am

Okay, thank you for your reply.

FranckPachot · July 1, 2024, 12:52pm

@ZhenNan2016 Which DB products are storing only WAL? It must either be applied to the database files (though a standby instance like many databases, or by the storage servers like in Aurora or Neon), or used in combination to frequent database backups so that only the latest WAL needs to be applied. I don’t think any database stores the WAL without applying it, and it needs some buffer in memory to apply it efficiently, MemTable or Shared Buffers.

dorian_yugabyte · July 2, 2024, 7:07am

The original bigtable (read old paper), hypertable & early hbase were like this (later changed to our model, example see “hbase hydrabase”).

I don’t know of any current ones. I know of segment replication in opensearch & quickwit (search index compaction is heavier).

ZhenNan2016 · July 2, 2024, 8:55am

Hi @dorian_yugabyte
Yes, you’r right.
There are currently some products developed based on HBase, and this model is also retained.
@FranckPachot

Topic		Replies	Views
Does yugabyte db fsync data to file? General	1	853	March 2, 2020
For master-follower2dc-deployment, is it possible to write data to follwer but not apply the change to master General	1	705	March 4, 2021
Fault tolerance General	5	1251	February 15, 2021
Multiple HDDs I/O General	1	1727	April 26, 2018
How are the shard leaders distributed across nodes in YugabyteDB? General	2	1327	February 19, 2020

Do followers need to write wal and memtable?

Related topics