Achieving real zero-downtime YEDIS by using a proxy

Given a 3 node setup that all run YEDIS, you can use any of the TServers as backends for clients. But when you have to update one node, you have to restart it, causing downtime (or it might go down for any other reason). If you’re too lazy to change clients to only talk to the 2 online nodes, you can use a proxy that takes care of this automatically.

Common choices are twemproxy by Twitter and dynomit by Netflix.

Each TServer runs 1 instance of either twemproxy or dynomite. The config file for all 3 proxies are nearly identical and specify all 3 TServer YEDIS addresses (with the local node having a higher priority). Recommended to set distribution to random, preconnect to true. More experimentation with auto_eject_hosts and server_retry_timeout is needed before I can recommend specific settings (if you have recommended values, please report back)

If downtime at one of the TServers occurs, the proxy will automatically route around the downtime until the TServer is back online. Clients are totally oblivious of the change, keeping admin work to a minimum.

As an added benefit this eliminates the startup latency of clients using authed instances because the proxy can keep one connection alive indefinitely.

1 Like

This is awesome @muehlio, thanks for sharing! :fire: :fire: :fire: