kubernetes disaster recovery

Deploying via git using argocd or flux makes disaster recovery fairly straightforward.

Using gitops means you can delete a kubernetes cluster, spin up a new one,  and have everything deployed back out in minutes.  But what about recovering the pvcs used before?

If you are using an infrastructure which implements csi, then you are able to allocate pvcs using storage managed outside of the cluster.  And, it turns out, reattaching to those pvcs is possible but you have to plan ahead.

Instead of writing yaml to spin up a pvc automatically, create the pv and pvc using manually set values.  Or spin up the pvcs automatically and then go back and modify the yaml to set recoverable values.  The howto is right up top in the csi documentation: https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/

Similarly, it is common for applications to spin up with randomly set admin passwords and such.  However, imagine a recovery scenario where a new cluster is stood up, you don’t want a new password stood up.  Use a vault with a password and reference the vault.

These two steps do add a little work, it’s the idea of taking a little more time to do things right, and in a production environment you want this.

Infrastructure side solution: https://velero.io/

Todo:  Create a video deleting a cluster and recovering all apps with a new cluster, recovering pvcs also (without any extra work on the recovery side).

Posted in Development, Infrastructure, Kubernetes.

Leave a Reply