Architecting teamtask / list: As a kubernetes controller

See previous posting for more details on the Teamtask / List algorithm.

Quick summary:  The algorithm has been called List because it can be used in the most basic use case, but frequently desired use case of needing to perform an action on multiple items, given a list of 100 hostnames for example, ping each one and see if it responds, track status of whether the hostname has been processed, so that if the script has to be restarted we don’t have to repeat work.  Turns out though, that if the implementation successfully implements a mutex, such as by the correct use of a database, multiple clients can help to process the list resulting in a well orchestrated distributed processing engine, (hence, Teamtask).

List has been implemented most recently as a webapi covering all the use cases one would expect, and placing this into a container as a microservice along with a database and respective helm chart is the next step.  But what about beyond that?

List works by breaking up a list of items into blocks for processing, by selecting an appropriate size the server will not carry a high cpu load or be busy.  Additionally a main List implementation can be used to hand out very large blocks, which secondary List servers can consume and then break up into smaller chunks for their clients, further reducing the load on the main server.  A misconfigured Job though, say of blocksize one, with millions of items to process and hundreds of clients could produce some peak resource consumption unnecessarily.  With this in mind, what if we were to use a controller in Kubernetes itsself?

Kubernetes at its core is a controller engine.  Controllers recognize yaml defined objects such as deployments, services, and ingresses.  Kubernetes is used around a certain set of controllers related to container management but really, we can create controllers for almost anything.  We could create a controller that knows how to play Tic-Tac-Toe for example, defining a game with a current state in a yaml.  A controller recognizing the yaml could see that the state indicates a move needs to be made and could make a move and update the status of the object / yaml.  One could see how such a system could be used to coordinate many games and through the nature of kubernetes, everything would scale, in the same way kubernetes manages the state of hundreds of deployments it could manage the state of multiple instances of a game.

As an excuse to write a kubernetes controller just for fun we could implement List.  A yaml with an api could define a List to process.  The controller could then create a block, or a few blocks, for clients to process.  The original List could define how many items to process, the size of a block to use, who has permissions to work on the blocks, etc …  The idea of a “client” could also be a controller within kubernetes.  This could be implemented in a similar way like how argocd recognizes an applicationset and upon processing creates one or more applications, which it also knows how to process.  By using kubernetes we could use its database and not have to deploy our own.  Potentially, resources could get out of hand if misconfigured, so safety checks would need to be put in place, but with our merging algorithm as described before the “behind the scenes kubernetes database” use should and cpu use should minimal on the list management side.

Such a controller would get all the benefits of using kubernetes, we could take advantage of built in error checking and status of the type we see with pods and scaling.  Such an implementation would lead to some fun investigations, how exactly does kubernetes manage all the pods it manages, is it checking them one at a time or all at one, a few at a time.  Whatever algorithm kubernetes uses to manage pods would be the same algorithm used to manage the list blocks.  Probably there are some built in limits to keep things sane, and perhaps we could take advantage of those.

Maybe controllers to process blocks wouldn’t bring any benefit, perhaps it would be better to just implement server side as a controller and clients could be run as Kubernetes Jobs, or just deployments setup to scale as desired, perhaps within resource quotas.  Still, in either case, it might make sense to define a Block type which upon processing would get an index & size added.  The Block could show as pending, in the same way an Ingress does while waiting for a loadbalancer ip, consumers could wait for the status to change and then work on the block.  Upon completion the status could be advanced to ‘completed’ when finished or something similar to communicate to the controller that the block is done.

How cool would it be do to do a ‘kubectl -n <namespace> get blocks’ and get the blocks currently being worked on displayed in the familiar kubectl style with current status?

$ k get blocks -o wide
NAME                           READY     STATUS        RESTARTS        INDEX      SIZE
primenumber-578b4958fc-cvtbm   1/1       Ready         2               0          1000
primenumber-578b4958fc-segcfs  1/1       Ready         0               1001       1000
primenumber-578b4958fc-wersw   0/1       Pending       0               Pending    Pending

If the List implementation were implemented as a controller within Kubernetes, we could still use it outside of the kubernetes cluster without having to implement a webapi because kubernetes itself can be accessed and used via a webapi, no kubectl required.  Sweet!!!  (course, we might not want users to access the kubeapi directly, wrap that api!)

Architecting teamtask / list: The early years

Teamtask (a.k.a. List) is a pet project / algorithm I developed back in 2000 as part of a brute force password cracking experiment.  Actually though, now that I think about it, I originally started working on the algorithm in 5th grade.

Back in my early years I wanted to password protect my computer, which wasn’t a thing back then.  I set about writing a program with a prompt for a username and password.  It worked.  I could start it up when the computer started, and though you could just ctrl-c out of it (not super sophisticated), my next thought was how could someone get around it.  I began investigating how to generate all passwords so they all could be tested one after another until the password was guessed.  I figured out the following two algorithms given a string of 3 characters ‘abc’ and length ‘3’:

aaa 111 000
aab 112 001
aac 113 002

abc 123
acb 132
bac 213
bca 231
cab 312
cba 321

The algorithms were thus: one generating all combinations with reusing characters and one without reusing characters.  The first worked best for brute force password cracking (though I didn’t know the term at the time, if it even existed).  But, in 5th grade I wasn’t able to create the algorithm to generate the strings.  Later in life though, I was able to create an algorithm for both using iteration with a base equal to the number of characters, rather than base 10, along with factorials(!).

With these two algorithms the following became possible:  If there were 6 possible arrangement of characters I could give a client the number, such as 1 along with a string of characters ‘abc’ and the client could translate that into ‘abc’, and do something with it (test if the password works).  Since the client only needed the index, and the string of characters, we could also give out a block of characters such as index=0, size=3.  This would result in two blocks that two different clients could work on simultaneously.  Each client would take a block, process three combinations, then report back the result.

Implementing the algorithm there’s one more magic that occurs.  One might initially implement the algorithm above in the following way, given 100 items to complete, and breaking those into chucks of 10, you could add 10 records to a database to reflect these blocks pending completion:

index = 0, size = 10, status = pending
index = 10, size = 10, status = pending

index = 90, size = 10, status = pending

After each completes you could mark the status as ‘completed’ and once all blocks are completed flag the Job as done.

However, this would mean when processing more extreme lists with thousands of blocks, think testing for the largest prime number ever found, you wouldn’t want to hold the status of all blocks.  With one more algorithm this concern disappears, what you do is merge sibling blocks, so if you have three blocks in a row and clients are working on them: (0, 10, 0), (10, 10, 0), (20, 10, 0), and the second two complete (0, 10, 0), (10, 10, 1), (20, 10, 1) , you can merge them for tracking purposes: (0, 10, 0), (10, 20, 1), and if the first completes you can merge again, (0, 30, 1), indicating from position 0 of size 30, all of those have been completed.  This conveniently means that when the whole list has been processed you will have one block with the whole size in completed status (0, 100000000, 1).

The algorithm has evolved to have timeouts with blocks, to handle the use case of a client disappearing while working on a block (or crashes), limiting the number of blocks a client can have at one time (to avoid some level of someone trying to interfer with processing by requesting blocks and not working on them), and work with OIDC to work within an enterprise infrastructure.

Roadmap:
– implement teamtask (a.k.a. list) as a container
– implement webapp gui & mobile gui, both with single implementation using flutter

Using kubeadm to setup cluster using centos 9 stream.

Is centos 9 stream a good choice? (sure)

I may end up switching to Sidero to setup and manage my onprem clusters, but for now I am continuing with centos, and moving from 8 to 9 so that I can use the wireguard module that comes with 9.  After several failures I have tracked down the few steps different from a centos 8 stream install.  Hopefully this will save someone a lot of days (and days and days, weeks?) of troubleshooting.

The key differences are:

1. In centos 8 stream you only needed to change the containerd from disabling containerd.  In centos 9 stream you need to copy the whole default configuration and change it to use systemd cgroup.  This script is currently working for me:

# make a copy of the default containerd configuration
containerd config default | sudo tee /etc/containerd/config.toml
# set to use systemd
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml
# adjust pause image to what's actually installed
PAUSE_IMAGE=$(kubeadm config images list | grep pause)
sudo -E sed -i "s,sandbox_image = .*,sandbox_image = \"$PAUSE_IMAGE\",g" /etc/containerd/config.toml

# restart the containerd service
sudo systemctl enable containerd
sudo systemctl restart container

2. There is something odd happening when performing the ‘kubeadm init’ which I was able to get around by doing the following:

# avoid a couple phases when performing kubeadmin init
sudo kubeadm init --control-plane-endpoint="<put_endpoint_here>:6443" --upload-certs --pod-network-cidr=<put_cni_cidr_here> \
--skip-phases=addon/kube-proxy \
--skip-phases=addon/coredns

# wait about 40 seconds then run the following to run the previously skipped phases
sudo kubeadm init phase addon all \
--control-plane-endpoint="<put_endpoint_here>:6443" \
--pod-network-cidr=<put_cni_cidr_here>

If I get a chance I’ll put together a video for this since there doesn’t seem to be one out there in the wild yet.

tether laptop via android phone in such a way as to give the laptop access to the phone’s vpn

Android tether how to:

  1. Setup wireguard on your phone to your network, this allows you to access your hosted webapps, and your laptop will also get to have access
  2. On phone install sshd such as simplesshd, by default it uses port 2222
  3. On laptop install adb, plug phone into usb, then run “adb forward tcp:2222 tcp:2222”, this will forward localhost:2222 on your laptop to the sshd running on the attached phone
  4. On laptop connect to ssh server using putty @ localhost on port 2222, also setup a port forwarding tunnel such as ‘Source port = 9999’, select ‘Dynamic’, nothing for destination
  5. This sets up a socks proxy, now on the laptop run chrome pointing it at the socks proxy c:\…\chrome.exe –proxy-server=”socks5://localhost:9999″ (easiest to just edit the shortcut).  ** before launching chrome in this way you have to close all chrome instances, otherwise it will appear not to work

There you go, now on your laptop you’ll be able to reach your home webapps as it will proxy through the phone which is running the vpn.  Note: you’ll also have to enable developer access in order to use adb.

This is perhaps not as secure as just using your phone’s built-in tether options, mine seems to put tethered connections behind a NAT which is a good idea, but in my case I wanted my laptop to use the vpn the phone was using.  Good luck!

Additional, openvpn:

Instead of configuring chrome to use the socks5 proxy, you can setup openvpn to use the socks5 proxy, then all of your networking will work, not just chrome. Just add the following to your openvpn client config (you’ll also need to setup an openvpn server of course, the above setup does not require it to be open to the internet, we access it via the ssh tunnel):

proto tcp
socks-proxy localhost 9999
connect-retry-max 1

remote <openvpn_ip> <openvpn_port> tcp

Actually, maybe the socks5 proxy isn’t needed at all if using openvpn, it just needs its port forwarded, no?

k8s-at-home, another project deprecated while at its prime

The open source community it capable of incredible things all working their main jobs then building things outside of work … but, it is all too common for people to also want to have a life outside of work.

We’ve lost another project just due to a lack of maintainers / availability.  It’s sad when it happens.  Even I didn’t have time to help out, and now it’s gone.

If only there were a way to pay people to maintain open source projects.

Think I’ll have a drink tonight in celebration of the k8s-at-home folks, thanks for everything you did.

Making learning fun by creating a “gray area” niche goal

One time, as a way to motivate a coworker who was working to learn scripting, I shared with him an algorithm to generate all possible string combinations given a string of possible characters.  This, of course could be used potentially to try all possible passwords for nefarious means.

Within about 10 minutes I had a manager in the office, looking at the whiteboard, telling me they weren’t stupid, that they knew what all that was on the board and that I need to not be teaching the employees how to hack computers.  I noticed a coworker sneaking out of the room trying to hide their laughter.  I think we know who called the manager.

It still makes me laugh to this day.  I suppose they weren’t completely wrong, but in my mind I figure if you tell someone to make every possible key possible for a type of car, and try them all, at some point you’ll get into that car.  Is that teaching a master class on how to break into cars?  I guess, but it sure isn’t efficient, master class it is not.  However, as a fun script for someone new to scripting to use to learn with, it’s a fun algorithm to write.

Along those same “gray area” lines I share with you a goal to help motivate one to learn kubernetes.  I’ll give you the exact steps necessary to make it happen.  The goal is this: “Let’s put together a micro-services architected solution to help you keep track of movies or tv shows you’d like to get around to watching some day”.  I used to use a physical notepad for this back in the day, then a notepad app on my phone, but modern days allow for modern solutions.  No longer do you have to type the full name of a movie or tv show, you can now type a little and search for it, then when its found click on it to add it to your queue.

Here’s how to do it and the applications you’ll need (this is for an onprem setup, you are own your own if you are deploying apps the world is able to see from the internet, try not to do that):

  1. If you don’t know kubernetes sign up for the udemy class “Certified Kubernetes Administrator (CKA) with Practice Tests”, then go ahead and take the exam and get the CKA certification.
  2. Decide which extra computer you have laying around is to be your NAS, where your network storage will live, and install truenas core on it.  Then configure iscsi to work with kubernetes provisioned storage.  You’ll need to deploy democratic-csi into your cluster and install scsi-related utilities on to your worker nodes in the next step.
  3. Spin up a cluster using kubeadm then deploy Plex Media Server into your cluster.  Plex is used to give a netflix-like experience around viewing your home media collection.  I recommend the kube-plex helm chart and to deploy using gitops using argocd.  kube-plex will deploy a pms instance that will spin up extra streaming processes on the fly, but I recommend disabling that feature in the values.yaml, to avoid any potential file locking issues… at least in the beginning.  Also, instead of using a network share for your configuration setup taints and tolerances to direct the plex application to only spin up on one node, and use local storage there.  (Plex tends to have database corruption when using network storage, go for it later if you want, but set yourself up for success initially.)
  4. Next deploy sonarr and radarr using the helm charts out at k8s-at-home, these are similar apps that work with tv and movies respectively.  k8s-at-home is a cool group in that they have a common library among all their helm charts, so for example, if you wanted to setup a vpn on any particular application as a side pod you could, or using a single vpn pod that multiple pods can route through, or add an ingress, or mount an additional volume, etc.
  5. Now, you are done, you can browse to your sonarr and radarr installation and search for movies and tv shows you’d like to watch someday.  Be sure to setup cert-manager and external-dns as well to register them in your local onprem dns and configure a valid certificate.
  6. You might also want to setup a vpn such as wireguard in your cluster as well as forwarding the needed vpn-related port through your router so that you can browse to these apps on your phone while you are out and about, that way, if you hear of a movie you want to queue up for later viewing you can do so in the moment.
  7. Interestingly, you can also click on Connect in both sonarr and radarr to configure your plex server for some reason.