mapping dns via argocd applicationset and external-dns

When using ArgoCD and an ApplicationSet to deploy external-dns to all clusters, as part of a grouping of addons common to all clusters, it can be useful to configure the DNS filter using variables:

        helm:
          releaseName: "external-dns"
          parameters:
          - name: external-dns.domainFilters
            value: "{ {{name}}.k.home.net }"
          - name: external-dns.txtOwnerId
            value: '{{name}}'
          - name: external-dns.rfc2136.zone
            value: '{{name}}.k.home.net'

This will place the cluster name as part of the dns name used by external-dns, resulting in the following type of FQDNs used by clusters:

app.dev.k.home.net
app.test.k.home.net
app.prod.k.home.net

Though, for my core cluster with components used by all clusters, I like to leave out the cluster name so all core components are at the k.<domain> level:

argocd.k.home.net
harbor.k.home.net
keycloak.k.home.net

git-based homedir folder structure (and git repos) using lessons learned

After reinstalling everything including my main linux workbench system it became the right time to finally get my home directory into git.  Taking all lessons learned up till this point it seemed a good idea to cleanup my git repo strategy as well.  The revised strategy:

[Git repos]

Personal:
- workbench-<user>

Team (i for infrastructure):
- i-ansible
- i-jenkins (needed ?)
- i-kubernetes (needed?)
- i-terraform
- i-tanzu

Project related: (source code)
- p-lido (use tagging dev/test/prod)
    doc
    src

Jenkins project pipelines:
- j-lifecycle-cluster-decommission
- j-lifecycle-cluster-deploy
- j-lifecycle-cluster-update
- j-lido-dev
- j-lido-test
- j-lido-prod

Cluster app deployments:
- k-core
- k-dev
- k-exp
- k-prod

[Folder structure]

i-ansible (git repo)
  doc
  bin
  plays ( ~/a )

i-jenkins (git repo) (needed ?)
  doc
  bin
  pipelines ( ~/j )

i-kubernetes (git repo) (needed ?)
  doc
  bin
  manage ( ~/k )
  templates

i-terraform (git repo)
  doc
  bin
  plans (~/p)
    k-dev

i-tanzu (git repo)
  doc
  bin
  application.yaml (-> appofapps)
  apps (~/t)
    appofapps/ (inc all clusters)
    k-dev/cluster.yaml

src
  <gitrepo>/<user> (~/mysrc) (these are each git repos)
  <gitrepo>/<team> (~/s) (these are each git repos)
    j-lifecycle-cluster-decommission
    j-lifecycle-cluster-deploy
    - deploy cluster
    - create git repo
    - create adgroups
    - register with argocd global
    j-lifecycle-cluster-update
    j-lido-dev
    j-lido-test
    j-lido-prod
    k-dev
      application.yaml (-> appofapps)
      apps
        appofapps/ (inc all apps)
    k-exp
      application.yaml (-> appofapps)
      apps
        appofapps/ (inc all apps)
    k-prod
      application.yaml (-> appofapps)
      apps
        appofapps/ (inc all apps)

workbench-<user> (git repo)
  doc
  bin

kubernetes disaster recovery

Deploying via git using argocd or flux makes disaster recovery fairly straightforward.

Using gitops means you can delete a kubernetes cluster, spin up a new one,  and have everything deployed back out in minutes.  But what about recovering the pvcs used before?

If you are using an infrastructure which implements csi, then you are able to allocate pvcs using storage managed outside of the cluster.  And, it turns out, reattaching to those pvcs is possible but you have to plan ahead.

Instead of writing yaml to spin up a pvc automatically, create the pv and pvc using manually set values.  Or spin up the pvcs automatically and then go back and modify the yaml to set recoverable values.  The howto is right up top in the csi documentation: https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/

Similarly, it is common for applications to spin up with randomly set admin passwords and such.  However, imagine a recovery scenario where a new cluster is stood up, you don’t want a new password stood up.  Use a vault with a password and reference the vault.

These two steps do add a little work, it’s the idea of taking a little more time to do things right, and in a production environment you want this.

Infrastructure side solution: https://velero.io/

Todo:  Create a video deleting a cluster and recovering all apps with a new cluster, recovering pvcs also (without any extra work on the recovery side).

fixing a wordpress pod after a brownout

Glad to have things back up though still looking into root cause, wouldn’t want to have to do this again.

Kubernetes utilities

So many utilities out there to explore:

https://collabnix.github.io/kubetools/

Love how every k8s-at-home helm chart can pipe an application via a VPN sidecar through a few lines of yaml.  And, said sidecar can have its own VPN connection or use a single gateway pod with the VPN connection, so cool!

The bitnami team also has a great standard among their helm charts, for example, consistent ways to specify a local repo and ingresses.  Many works in progress.

Took at look at tor solutions just for fun, several proxies in kubernetes… as well as solutions ready to setup a server via an onion link using a tor kubernetes controller.  Think I’ll give it a try.

This kube rabbit hole is so much fun!

xcp-ng & kubernetes

Decided to test out xcp-ng as my underlying infrastructure to setup kubernetes clusters.

Initially xcp-ng, the open source implementation of Citrix’s XenServer, appears very similar to vmware yet the similarities disappear quickly.  VMware implemented the csi and cpi apis used by kubernetes integrations early on.  These implementations are only becoming more evolved whereas xcp-ng is looking for volunteers to begin the implementations.  Why start from scratch when vmware already has a developed solution?

What about a utility to spin up a cluster?  Google, AWS, and VMware all have a cli to spin up and work with clusters… xcp-ng not so much, would need to use a third party solution such as terraform.

xcp-ng is great on a budget but lacks the apis needed for a fully integrated kubernetes solution.  Clusters, and their storage solutions must be built and maintained manually.  At this point I wonder how anyone could choose a Citrix XenServer solution knowing everything is headed towards kubernetes.  But for a free solution with two or three manually managed clusters, yes.

and then we have argocd, with its ability to restore an entire cluster, even multiple clusters simultaneously simply due to its inherent declarative nature

more kubernetes magic

 

Helm chart check list – work flow from dev to prod (always evolving)

Helm chart setup workflow

Dev

  1. Initially get things working with a default helm install into dev.
  2. Does it have ldap or oidc integration or some other reason to need to verify onprem CA chain? Yes, figure out how to get CA chain installed
  3. Setup oidc if possible, setup ldap if needed and oidc is not available.
  4. Is it possible to set values in helm chart to get CA chain installed as part of the helm install? Yes, modify yaml, no fork helm install and add steps, see if project owner is OK with pull request or not. (better in Integrate with helm chart than manage a fork).
  5. Configure something on server you want to persist.
  6. Helm uninstall and reinstall, did you lose the setting? … figure out steps to get data to persist.
  7. Try to increase the storage size of PVCs which might need to be increased in production?  Is this possible without taking down the application?  Figure out what is required for this use case which will inevitably come up in the future.  Document and be ready.  It may be wise to implement a pipeline for this purpose.
  8. Cordon involved node(s) and drain, uncordon node(s), was data lost when things came back up? Figure out how to ensure data persists.
  9. Does server have a method to export configuration or otherwise backup the server. Configure automated backups.
  10. Is it possible to Configure server as HA? Can it be configured for HA later or must it be configured from initial setup? Can a single instance be migrated to HA. Decide if HA needs to be setup or is a single instance good enough. If HA is desired then figure out how to set that up and go through this list again.
  11. Are there options to configure metrics for the application?  Often these exist in helm installs.  (lower priority when initially working to get something up)
  12. If there is an option to use a log aggregator set that up or possibly setup a sidecar with logging.  (lower priority when initially working to get something up)
  13. Server is now ready to release into test.

Test

  1. Configure permissions of those with accounts accessed via oidc / ldap. Note a program which supports ldap but not oidc is not as evolved. Check for a plugin/extension if oidc does not appear to be available. Oidc enables almost any identity provider & SSO, and is always preferred.
  2. Does the minimum requested CPU and Memory match what’s actually needed?
  3. Someone needs to perform some manual testing or work on automated testing.
  4. If no one ever tries restoring from a backup, there is a good chance the process might not work, might want to try that out before there is a fire.
  5. No system may be released into production without an automated method of registering its ip in dns (e.g. external-dns) and also an automated method of updating its ssl certificates (e.g. cert-man), verify these work.
  6. Be sure to test rolling up to the next release of a helm chart as well as rolling back (and all tests still pass).
  7. If all testing passes then ready for production.

Production

  1. An update strategy needs to be established and followed just prior to release into production. Schedule: Monthly, quarterly, every 6 months, or upon release of a new version. Version: always run the latest, or version just prior to latest major release (and with all the updates).  Some programs such as WordPress can/will update plugins automatically… is this ok?
  2. Generally automation is desired to roll something out into production. When an update is ready automation should be used to update first in dev, perform automated testing, then roll out into dev with the ok of someone (or automatically rolled out into prod if all tests passed and its decided that is good enough).
  3. Also, a pipeline for rolling back to a previous version is a good idea, in case a deployment to production fails.

(pull request) Contributing to open source, helm chart for taiga, ability to import an on prem certificate authority certificate chain

OIDC is always preferred if possible.  At this time in history not all projects have OIDC support, though some can be extended via an extension or plugin to accomplish the goal.  I’ve got enough experience to help projects get over this hurdle and get OIDC working.  If I could be paid just to help out open source projects I might go for it.

Here’s a pull request for a taiga helm chart I’ve been using.  I’ve been using taiga for years via docker and am happy to be able to help out in this way now that I’m using kubernetes and helm charts.  In this case a borrowed a technique from a nextcloud helm chart and works perfectly for this taiga helm chart:  https://github.com/nemonik/taiga-helm/pull/6