Is my logic correct in assuming ( with all nodes having plenty of free space ):
YugabyteDB with 3 nodes:
Single node failure, the replication will distribute among the two remaining nodes. Read / Write remains available If another node dies, it enters read-only mode.
Dual node ( both instantly unable ) failure, the remaining node goes into read-only mode.
YugabyteDB with 4 nodes:
Single node failure, the replication will distribute among the three remaining nodes. If the replication has been successful, YugabyteDB enters a state of “3 nodes” ( See 3 node example how it will be handled upon future failures ).
Dual node failure ( both failing withing 60 seconds )? Does YugabyteDB goes into read-only mode?
YugabyteDB with 5 nodes:
Single node failure, the replication will distribute among the four remaining nodes. If the replication has been successful, YugabyteDB enters into 4 node behavior ( See 4 node example how it will be handled upon future failures ).
Dual node failure ( both failing withing 60 seconds )? Replication will start between the 3 nodes. if successful, YugabyteDB enters into 3 node behavior ( See 3 node example how it will be handled upon future failures ). Able to handle another failure.
Triple node failure ( all 3 failing withing 60 seconds )? Does YugabyteDB goes into read-only mode?
Is this understanding correct? If yes, please also also update the documentation because i see in a lot of DB’s like YugabyteDB, CRDB that talk about node failures, do not really mention “instantaneous” failures vs “slow” failure ( with a chance to rebuild and enter a different node setup ). And the effects on different type of node 3,4,5,6,7 with instant vs slow failures on the DB infrastructure are mostly glanced over.
Assuming RF=3 in all scenarios.
Assuming free disk space in all scenarios.
If a node fails, the tablets which leaders resided on the failed node won’t accept writes until new leaders have been chosen on the other nodes after --leader_failure_max_missed_heartbeat_periods (default 3 seconds)
If we lose 2 peers of a tablet, then we have read only of those tablets ONLY when the peer that is available already was the leader (it can’t pick new leaders because there’s no quorum) OR when using follower reads. If those come back online, they will start replicating. If we lost them forever(15+minutes), then we must do manual recovery of those tablets.
If we lose 3 peers of a tablet, that tablet is unavailable for read/writes. No replicas exist. You must resurrect at least one of those nodes. If 1 tablet resurrection, follow if we lose 2 peers of a tablet.