How did I ever live without … Palo Alto Firewall

Every now and then you come across a technology that you didn’t know about before, and you just can’t believe you haven’t been using the technology all this time.

Of course, at least you did find it … eventually.  Thank goodness.

I find myself just shaking my head in almost disbelief at all the things this thing can do.

It’s interesting to see what countries your apps are connecting to, who is accessing your website, logs of access, etc… so much data to geek out about.

(storage) ceph is amazing

If you haven’t tried out ceph yet, and are not yet completely satisfied with your onprem storage system, I recommend giving it a try.  (Note, it does want a lot of cpu, so heads up on that.)

** I acknowledge that I am currently excited and completely captivated by ceph.  I’m still fairly new to using ceph, so you might want to check my facts, a.k.a. I’m tempting you to start investigating. 😉

Ceph is able to use multiple disks on multiple servers and spread out the load, maintaining two or three copies of data to avoid data loss.  Just look at my humble 4 disk system with the data spread out nearly perfectly across the disks.  (2 nodes with 2 disks each)

Ceph provides cephfs (ceph filesystem), cephrbd (ceph block storage).

Block storage gives you something like a disk, it’s useful for say creating a vm that you later want to hot-swap between servers.

Otherwise, the ceph file system is what you want to use, though not directly.

You’ll end up creating a pool which is based on the ceph filesystem (cephfs), then create pvcs that come from the pool. Also you can use iscsi (which uses cephrbd) or nfs, (which uses cephfs), if you have a consumer that is not able to connect with cephfs or cephrbd directly.

In my experiences working with storage systems and kubernetes persistent volumes, I was not having luck with RWX (read write many), even when the providers claimed they worked (nfs using my own linux server, longhorn which uses nfs for rwx). I found two apps ‘plex’ and ‘nextcloud’ would consistently experience database corruption after only a few minutes.

People continually told me “NFS supports RWX” and “Just use iSCSI, that’s perfect for apps that use SQLite”. I tested these claims and did not get the same result using a TrueNAS Core server.

However, with ceph you can allocate a pvc using cephfs, and this works perfectly with RWX! Awesome! And super fast! All my jenkins builds which run in kubernetes sped up by 15 seconds vs TrueNAS iSCSI, of course … this could just mean the physical disks running with ceph are faster than the physical disks running with my truenas server, can’t be sure.

Now I suspect if I created pvcs using NFS and iSCSI, which work on top of cephfs, that they might also support RWX. I’ve very curious, but given how fast cephfs and that everything is working perfectly, I can’t see any reason to use NFS or iSCSI (except for vm disks).

Using the helm install of ceph you end up with a tools pod that you can exec into and run the ‘ceph’ cli. The dashboard gui is pretty great, but the real interaction with ceph happens at the command line level. I’ve been in there breaking & fixing things and the experience feels like a fully fledged product ready for production. There is so much there I can see someone managing ceph as a career with an available deep dive as far as you are interested in going. If you break it enough and then fix it you get to watch it moving data around recovering things which is as cool as can be to watch, from the most geeky perspective.

In any case, given how long cephfs has been around and the popularity it has in use in production environments, I think I’ve found my storage solution for the foreseeable future.

Ceph – Getting Started

(centos) k8s-update.sh – script to upgrade a kubernetes cluster

Script to update a kubernetes cluster to the next patch or minor version.

#!/bin/bash

# if no parameter, show versions and syntax
if [ -z $1 ]; then
  # show available versions
  yum list --showduplicates kubeadm --disableexcludes=kubernetes

  # show syntax
  echo ""
  echo "Syntax:"
  echo "$0 <version>, e.g. $0 1.26.x-0"
  exit 1
fi

# remember version
export TARGET_VERSION=$1


# configure kubectl to use admin config
export KUBECONFIG=/etc/kubernetes/admin.conf

# track first control plane node
export IS_FIRST=1

# loop through control plane nodes
#kubectl get nodes --no-headers | xargs -n 5 echo
NODES=`kubectl get nodes --no-headers | awk '{print $1}'`
for NODE in $NODES; do
  # parse kubectl node output into parameters
  NODE_HOSTNAME=`kubectl get node $NODE --no-headers | xargs -n 5 bash -c 'echo $0'`
  NODE_TYPE=`kubectl get node $NODE --no-headers | xargs -n 5 bash -c 'echo $2'`
  NODE_VERSION=`kubectl get node $NODE --no-headers | xargs -n 5 bash -c 'echo $4'`

  # only work on control plane nodes in this loop
  if [ $NODE_TYPE != "control-plane" ]; then
     #echo ""
     #echo "skipping worker node"
     continue
  fi

  echo ""
  echo "***"
  echo "* Next: $NODE_HOSTNAME"

  # upgrade kubeadm
  echo "upgrade to: $TARGET_VERSION"
  ssh root@$NODE_HOSTNAME yum install -y kubeadm-$TARGET_VERSION --disableexcludes=kubernetes

  # verify the download works and has the expected version
  #ssh root@$NODE_HOSTNAME kubeadm version

  # verify the upgrade plan
  #ssh root@$NODE_HOSTNAME kubeadm upgrade plan

  # perform the update
  if [ $IS_FIRST == "0" ]; then
    ssh root@$NODE_HOSTNAME kubeadm upgrade node
  else
    # if this is the first control plane node its command is a little different
    ssh root@$NODE_HOSTNAME kubeadm upgrade apply --yes v$TARGET_VERSION

    # adjust tracking now that we've completed the first control plane node
    export IS_FIRST=0
  fi

  # drain node & prepare for updating
  kubectl drain $NODE_HOSTNAME --delete-emptydir-data --ignore-daemonsets

  # update kubelet & kubectl
  ssh root@$NODE_HOSTNAME yum install -y kubelet-$TARGET_VERSION kubectl-$TARGET_VERSION --disableexcludes=kubernetes

  # restart kubelet
  ssh root@$NODE_HOSTNAME systemctl daemon-reload
  ssh root@$NODE_HOSTNAME systemctl restart kubelet

  # uncordon the node
  kubectl uncordon $NODE_HOSTNAME
  
done


# loop through worker nodes
NODES=`kubectl get nodes --no-headers | awk '{print $1}'`
for NODE in $NODES; do
  # parse kubectl node output into parameters
  NODE_HOSTNAME=`kubectl get node $NODE --no-headers | xargs -n 5 bash -c 'echo $0'`
  NODE_TYPE=`kubectl get node $NODE --no-headers | xargs -n 5 bash -c 'echo $2'`
  NODE_VERSION=`kubectl get node $NODE --no-headers | xargs -n 5 bash -c 'echo $4'`

  # only work on control plane nodes in this loop
  if [ $NODE_TYPE == "control-plane" ]; then
     #echo ""
     #echo "skipping control plane node"
     continue
  fi

  echo ""
  echo "***"
  echo "* Next: $NODE_HOSTNAME"

  # upgrade kubeadm
  echo "upgrade to: $TARGET_VERSION"
  ssh root@$NODE_HOSTNAME yum install -y kubeadm-$TARGET_VERSION --disableexcludes=kubernetes

  # verify the download works and has the expected version
  #ssh root@$NODE_HOSTNAME kubeadm version

  # verify the upgrade plan
  #ssh root@$NODE_HOSTNAME kubeadm upgrade plan

  # perform the update
  if [ $IS_FIRST == "0" ]; then
    ssh root@$NODE_HOSTNAME kubeadm upgrade node
  fi

  # drain node & prepare for updating
  kubectl drain $NODE_HOSTNAME --delete-emptydir-data --ignore-daemonsets

  # update kubelet & kubectl
  ssh root@$NODE_HOSTNAME yum install -y kubelet-$TARGET_VERSION kubectl-$TARGET_VERSION --disableexcludes=kubernetes

  # restart kubelet
  ssh root@$NODE_HOSTNAME systemctl daemon-reload
  ssh root@$NODE_HOSTNAME systemctl restart kubelet

  # uncordon the node
  kubectl uncordon $NODE_HOSTNAME
  
done

cephfs via argocd

Wahoo! Finally got cephfs working in my home lab the way I want it, accessing a source server (running in k8s) via external k8s clusters, all managed by argocd & working with my vclusters.

Looking forward to playing around with a reliable RWX environment. Can finally update pods with 0 downtime, awesome …

Had to basically hack another guys script to get the resources for my argocd deployment, lol, https://github.com/rook/rook/issues/11157, hopefully they can get the changes implemented so the next person doesn’t have to. Hard to imagine anyone with a gitops setup without it, but I can’t be the first to do this?

A little script to roll a cluster, useful if managing your own.

#!/bin/bash
# roll-cluster 

# get list of nodes
NODES=(`kubectl get nodes -o jsonpath='{.items[*].status.addresses[1].address}'`)
for NODE in "${NODES[@]}"
do
  echo ""
  echo "[$NODE]"
  TMP=`kubectl get node $NODE | grep Ready | grep -v 'control-plane'`
  if [ "$TMP" == "" ]; then
    continue
  fi

  echo "- draining $NODE"
  kubectl drain $NODE --ignore-daemonsets --delete-emptydir-data
  echo "- sending reboot command, enter password if prompted by sudo"
  ssh root@$NODE reboot
  echo "- waiting for node go down (no ping)"
  while [ "$(ping $NODE -c 4 | grep packet | grep -c ' 0\% packet loss')" == 1 ]; do
    sleep 1;
    done
  echo "- wait for node to show as ' Ready'"
  while (true); do
    a=`kubectl get node $NODE | grep " Ready"`
    if [ "$a" != "" ];
      then break;
      else sleep 1;
    fi
    done;
  echo "- uncordon $NODE"
  kubectl uncordon $NODE
done