Fault tolerance

Is my logic correct in assuming ( with all nodes having plenty of free space ):

YugabyteDB with 3 nodes:

  • Single node failure, the replication will distribute among the two remaining nodes. Read / Write remains available If another node dies, it enters read-only mode.
  • Dual node ( both instantly unable ) failure, the remaining node goes into read-only mode.

YugabyteDB with 4 nodes:

  • Single node failure, the replication will distribute among the three remaining nodes. If the replication has been successful, YugabyteDB enters a state of “3 nodes” ( See 3 node example how it will be handled upon future failures ).
  • Dual node failure ( both failing withing 60 seconds )? Does YugabyteDB goes into read-only mode?

YugabyteDB with 5 nodes:

  • Single node failure, the replication will distribute among the four remaining nodes. If the replication has been successful, YugabyteDB enters into 4 node behavior ( See 4 node example how it will be handled upon future failures ).
  • Dual node failure ( both failing withing 60 seconds )? Replication will start between the 3 nodes. if successful, YugabyteDB enters into 3 node behavior ( See 3 node example how it will be handled upon future failures ). Able to handle another failure.
  • Triple node failure ( all 3 failing withing 60 seconds )? Does YugabyteDB goes into read-only mode?

Is this understanding correct? If yes, please also also update the documentation because i see in a lot of DB’s like YugabyteDB, CRDB that talk about node failures, do not really mention “instantaneous” failures vs “slow” failure ( with a chance to rebuild and enter a different node setup ). And the effects on different type of node 3,4,5,6,7 with instant vs slow failures on the DB infrastructure are mostly glanced over.

Hi @Ben_Jiro

Welcome to YugabyteDB Forum!

Are we assuming replication factor = 3 in all those cases ?

Regards,
Dorian
Technical Support Engineer

Indeed. A replication of 3.

Assuming RF=3 in all scenarios.
Assuming free disk space in all scenarios.

  1. If a node fails, the tablets which leaders resided on the failed node won’t accept writes until new leaders have been chosen on the other nodes after --leader_failure_max_missed_heartbeat_periods (default 3 seconds)

  2. If a node is down for --follower_unavailable_considered_failed_sec (default 15 minutes), the node is considered dead and data will start replicating to the other machines.

  3. If we lose 2 peers of a tablet, then we have read only of those tablets ONLY when the peer that is available already was the leader (it can’t pick new leaders because there’s no quorum) OR when using follower reads. If those come back online, they will start replicating. If we lost them forever(15+minutes), then we must do manual recovery of those tablets.

  4. If we lose 3 peers of a tablet, that tablet is unavailable for read/writes. No replicas exist. You must resurrect at least one of those nodes. If 1 tablet resurrection, follow if we lose 2 peers of a tablet.

  5. Generally if RF is n , YugabyteDB can survive (n - 1) / 2 failures without compromising correctness or availability of data.

  6. Replication happens per-tablet. Example: assuming a transaction that needs data in 2 tablets… If both have leaders, the transaction can continue.

See 1.

There is no reason to have 2 replicas of the same tablet in a server (3 replicas distributed on 2 servers) so no replication needed.

See 3 above.

See 1,2.

See 1,2,3.

See 1,2.

See 1,2,3.

See 1,2,3,4.

Reads too wont be served until the tablet leader is chosen for those tablets that had the leaders on the 1 node that went down, isnt it?

Yes you won’t be able to read from leaders since they are down.
In YCQL (soon in YSQL) you can also read from tablet peers.