A little script to roll a cluster, useful if managing your own.

#!/bin/bash
# roll-cluster 

# get list of nodes
NODES=(`kubectl get nodes -o jsonpath='{.items[*].status.addresses[1].address}'`)
for NODE in "${NODES[@]}"
do
  echo ""
  echo "[$NODE]"
  TMP=`kubectl get node $NODE | grep Ready | grep -v 'control-plane'`
  if [ "$TMP" == "" ]; then
    continue
  fi

  echo "- draining $NODE"
  kubectl drain $NODE --ignore-daemonsets --delete-emptydir-data
  echo "- sending reboot command, enter password if prompted by sudo"
  ssh root@$NODE reboot
  echo "- waiting for node go down (no ping)"
  while [ "$(ping $NODE -c 4 | grep packet | grep -c ' 0\% packet loss')" == 1 ]; do
    sleep 1;
    done
  echo "- wait for node to show as ' Ready'"
  while (true); do
    a=`kubectl get node $NODE | grep " Ready"`
    if [ "$a" != "" ];
      then break;
      else sleep 1;
    fi
    done;
  echo "- uncordon $NODE"
  kubectl uncordon $NODE
done

Architecting teamtask / list: As a kubernetes controller

See previous posting for more details on the Teamtask / List algorithm.

Quick summary:  The algorithm has been called List because it can be used in the most basic use case, but frequently desired use case of needing to perform an action on multiple items, given a list of 100 hostnames for example, ping each one and see if it responds, track status of whether the hostname has been processed, so that if the script has to be restarted we don’t have to repeat work.  Turns out though, that if the implementation successfully implements a mutex, such as by the correct use of a database, multiple clients can help to process the list resulting in a well orchestrated distributed processing engine, (hence, Teamtask).

List has been implemented most recently as a webapi covering all the use cases one would expect, and placing this into a container as a microservice along with a database and respective helm chart is the next step.  But what about beyond that?

List works by breaking up a list of items into blocks for processing, by selecting an appropriate size the server will not carry a high cpu load or be busy.  Additionally a main List implementation can be used to hand out very large blocks, which secondary List servers can consume and then break up into smaller chunks for their clients, further reducing the load on the main server.  A misconfigured Job though, say of blocksize one, with millions of items to process and hundreds of clients could produce some peak resource consumption unnecessarily.  With this in mind, what if we were to use a controller in Kubernetes itsself?

Kubernetes at its core is a controller engine.  Controllers recognize yaml defined objects such as deployments, services, and ingresses.  Kubernetes is used around a certain set of controllers related to container management but really, we can create controllers for almost anything.  We could create a controller that knows how to play Tic-Tac-Toe for example, defining a game with a current state in a yaml.  A controller recognizing the yaml could see that the state indicates a move needs to be made and could make a move and update the status of the object / yaml.  One could see how such a system could be used to coordinate many games and through the nature of kubernetes, everything would scale, in the same way kubernetes manages the state of hundreds of deployments it could manage the state of multiple instances of a game.

As an excuse to write a kubernetes controller just for fun we could implement List.  A yaml with an api could define a List to process.  The controller could then create a block, or a few blocks, for clients to process.  The original List could define how many items to process, the size of a block to use, who has permissions to work on the blocks, etc …  The idea of a “client” could also be a controller within kubernetes.  This could be implemented in a similar way like how argocd recognizes an applicationset and upon processing creates one or more applications, which it also knows how to process.  By using kubernetes we could use its database and not have to deploy our own.  Potentially, resources could get out of hand if misconfigured, so safety checks would need to be put in place, but with our merging algorithm as described before the “behind the scenes kubernetes database” use should and cpu use should minimal on the list management side.

Such a controller would get all the benefits of using kubernetes, we could take advantage of built in error checking and status of the type we see with pods and scaling.  Such an implementation would lead to some fun investigations, how exactly does kubernetes manage all the pods it manages, is it checking them one at a time or all at one, a few at a time.  Whatever algorithm kubernetes uses to manage pods would be the same algorithm used to manage the list blocks.  Probably there are some built in limits to keep things sane, and perhaps we could take advantage of those.

Maybe controllers to process blocks wouldn’t bring any benefit, perhaps it would be better to just implement server side as a controller and clients could be run as Kubernetes Jobs, or just deployments setup to scale as desired, perhaps within resource quotas.  Still, in either case, it might make sense to define a Block type which upon processing would get an index & size added.  The Block could show as pending, in the same way an Ingress does while waiting for a loadbalancer ip, consumers could wait for the status to change and then work on the block.  Upon completion the status could be advanced to ‘completed’ when finished or something similar to communicate to the controller that the block is done.

How cool would it be do to do a ‘kubectl -n <namespace> get blocks’ and get the blocks currently being worked on displayed in the familiar kubectl style with current status?

$ k get blocks -o wide
NAME                           READY     STATUS        RESTARTS        INDEX      SIZE
primenumber-578b4958fc-cvtbm   1/1       Ready         2               0          1000
primenumber-578b4958fc-segcfs  1/1       Ready         0               1001       1000
primenumber-578b4958fc-wersw   0/1       Pending       0               Pending    Pending

If the List implementation were implemented as a controller within Kubernetes, we could still use it outside of the kubernetes cluster without having to implement a webapi because kubernetes itself can be accessed and used via a webapi, no kubectl required.  Sweet!!!  (course, we might not want users to access the kubeapi directly, wrap that api!)

Architecting teamtask / list: The early years

Teamtask (a.k.a. List) is a pet project / algorithm I developed back in 2000 as part of a brute force password cracking experiment.  Actually though, now that I think about it, I originally started working on the algorithm in 5th grade.

Back in my early years I wanted to password protect my computer, which wasn’t a thing back then.  I set about writing a program with a prompt for a username and password.  It worked.  I could start it up when the computer started, and though you could just ctrl-c out of it (not super sophisticated), my next thought was how could someone get around it.  I began investigating how to generate all passwords so they all could be tested one after another until the password was guessed.  I figured out the following two algorithms given a string of 3 characters ‘abc’ and length ‘3’:

aaa 111 000
aab 112 001
aac 113 002

abc 123
acb 132
bac 213
bca 231
cab 312
cba 321

The algorithms were thus: one generating all combinations with reusing characters and one without reusing characters.  The first worked best for brute force password cracking (though I didn’t know the term at the time, if it even existed).  But, in 5th grade I wasn’t able to create the algorithm to generate the strings.  Later in life though, I was able to create an algorithm for both using iteration with a base equal to the number of characters, rather than base 10, along with factorials(!).

With these two algorithms the following became possible:  If there were 6 possible arrangement of characters I could give a client the number, such as 1 along with a string of characters ‘abc’ and the client could translate that into ‘abc’, and do something with it (test if the password works).  Since the client only needed the index, and the string of characters, we could also give out a block of characters such as index=0, size=3.  This would result in two blocks that two different clients could work on simultaneously.  Each client would take a block, process three combinations, then report back the result.

Implementing the algorithm there’s one more magic that occurs.  One might initially implement the algorithm above in the following way, given 100 items to complete, and breaking those into chucks of 10, you could add 10 records to a database to reflect these blocks pending completion:

index = 0, size = 10, status = pending
index = 10, size = 10, status = pending

index = 90, size = 10, status = pending

After each completes you could mark the status as ‘completed’ and once all blocks are completed flag the Job as done.

However, this would mean when processing more extreme lists with thousands of blocks, think testing for the largest prime number ever found, you wouldn’t want to hold the status of all blocks.  With one more algorithm this concern disappears, what you do is merge sibling blocks, so if you have three blocks in a row and clients are working on them: (0, 10, 0), (10, 10, 0), (20, 10, 0), and the second two complete (0, 10, 0), (10, 10, 1), (20, 10, 1) , you can merge them for tracking purposes: (0, 10, 0), (10, 20, 1), and if the first completes you can merge again, (0, 30, 1), indicating from position 0 of size 30, all of those have been completed.  This conveniently means that when the whole list has been processed you will have one block with the whole size in completed status (0, 100000000, 1).

The algorithm has evolved to have timeouts with blocks, to handle the use case of a client disappearing while working on a block (or crashes), limiting the number of blocks a client can have at one time (to avoid some level of someone trying to interfer with processing by requesting blocks and not working on them), and work with OIDC to work within an enterprise infrastructure.

Roadmap:
– implement teamtask (a.k.a. list) as a container
– implement webapp gui & mobile gui, both with single implementation using flutter

Using kubeadm to setup cluster using centos 9 stream.

Is centos 9 stream a good choice? (sure)

I may end up switching to Sidero to setup and manage my onprem clusters, but for now I am continuing with centos, and moving from 8 to 9 so that I can use the wireguard module that comes with 9.  After several failures I have tracked down the few steps different from a centos 8 stream install.  Hopefully this will save someone a lot of days (and days and days, weeks?) of troubleshooting.

The key differences are:

1. In centos 8 stream you only needed to change the containerd from disabling containerd.  In centos 9 stream you need to copy the whole default configuration and change it to use systemd cgroup.  This script is currently working for me:

# make a copy of the default containerd configuration
containerd config default | sudo tee /etc/containerd/config.toml
# set to use systemd
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml
# adjust pause image to what's actually installed
PAUSE_IMAGE=$(kubeadm config images list | grep pause)
sudo -E sed -i "s,sandbox_image = .*,sandbox_image = \"$PAUSE_IMAGE\",g" /etc/containerd/config.toml

# restart the containerd service
sudo systemctl enable containerd
sudo systemctl restart container

2. There is something odd happening when performing the ‘kubeadm init’ which I was able to get around by doing the following:

# avoid a couple phases when performing kubeadmin init
sudo kubeadm init --control-plane-endpoint="<put_endpoint_here>:6443" --upload-certs --pod-network-cidr=<put_cni_cidr_here> \
--skip-phases=addon/kube-proxy \
--skip-phases=addon/coredns

# wait about 40 seconds then run the following to run the previously skipped phases
sudo kubeadm init phase addon all \
--control-plane-endpoint="<put_endpoint_here>:6443" \
--pod-network-cidr=<put_cni_cidr_here>

If I get a chance I’ll put together a video for this since there doesn’t seem to be one out there in the wild yet.