homelab: planning next incarnation

Thinking about redeploying my homelab from scratch, perhaps switching from xenserver back to vmware. I’d like to start out with external-secrets and have all secrets in a vault right from the beginning, also curious what a 100% open source, 100% kubernetes environment would look like. Maybe two networks, one 100% kubernetes, and a 2nd for windows client systems. Here’s the k8s plan so far:

k-seed:
- manual setup of seed cluster
  - helm install argocd
  - argocd install clusterapi/crossplane/etc...
- seed-argocd deploy non-production cluster using vcluster or clusterapi/crossplane/etc...
  - deploy metallb & configure loadbalancer ip range (can we automate this w/ cluster deploy?)
  - add cluster to seed-argocd instance
- seed-argocd deploy production cluster using vcluster or clusterapi/crossplane/etc...
  - deploy metallb & configure loadbalancer ip range (can we automate this w/ cluster deploy?)
  - add cluster to seed-argocd instance
- seed-argocd deploy argocd to production cluster (k-prod)

k-prod:
- argocd configure storageclass
- argocd deploy hashicorp vault
  - configure as certificate authority
  - configure as keyvault
- argocd deploy external-secrets
  - configure to use keyvault
  - add secret 'ca-bundle.crt': public certificate authority certificate in DER format
  - *from now on all secrets to get values via external-secrets
- argocd deploy cert-manager
  - configure to use hashicorp vault as certificate authority
- argocd deploy pihole
  - configure dns1 & dns2
- argocd deploy external-dns
  - configure to use pihole as dns
- update with annotations to use external-dns & cert-manager:
  - argocd
  - vault
  - pihole
  - *from now on all ingress yaml to include annotations for external-dns & cert-manager
    - recommended: have annotations from the beginning, at this point they will start working
- argocd deploy keycloak
  - configure realm: create or import from backup
  - add secret 'default_oidc_client_secret': secret part of oidc client/secret
  - configure a user account (or configure federation via AD, openldap, etc...)
- deploy all other apps
  - oidc client_secret should come from external-secrets in all apps configured with oidc
    - this might require an init container for some apps

k-ceph:
- pvc storage for all clusters
- block storage can be used for vm disks (making for easy hotswap)
- upgrade to 2 10gb ports on each host system

wdc: (kubevirt in theory but think i'll stick w/ a vm)
- domain controller
- user management
- dhcp
- wds
- wsus using dev sqlserver & data stored on e drive

Was thinking about writing a lunch voting app for work

Something similar to https://lunch.pink but specific for work, so running in a container with OIDC setup for coworkers.  Coworker then could create a lunch event public or private, invite folks, folks could respond with interest and vote for their favorite restaurants, or just keep their same favorite restaurants from last time.

As a webapi could create integrations into Teams or a mobile app.

Could be a good excuse to write a controller and some custom resource definitions.  Some initial ideas captured:

---
New-Restaurant

  Add new restaurant to favorites list
 
Get-Restaurant

  Get listing of preferred restaurants

Set-Restaurant

  Enable/disable restaurant as favorite to include with voting
 
Remove-Restaurant

  Remove restaurant from favorites list

---
New-Event

  Create new event
  -Description ""
  -IsViewable <true>
  -IsInviteOnly <false>
  -Timeout <when_voting_ends, default 5 minutes>

Get-Event

  Show all events if admin, otherwise only show public events, or events you've been invited to

Set-Event

  Adjust attributes 'IsViewable', 'IsInviteOnly', and reopen voting.

Remove-Event

  Remove event

---
New-Invite

  Create new invitation to event
  - only owner can create invitation
  - invite id is a guid

  -Username <username> // <optional> send through teams, or email, etc...
  -EventId

Get-Invite

  View current/expired invitations

Set-Invite

  Accept / Reject an invitation

Remove-Invite

  Withdrawl an invite

How did I ever live without … Palo Alto Firewall

Every now and then you come across a technology that you didn’t know about before, and you just can’t believe you haven’t been using the technology all this time.

Of course, at least you did find it … eventually.  Thank goodness.

I find myself just shaking my head in almost disbelief at all the things this thing can do.

It’s interesting to see what countries your apps are connecting to, who is accessing your website, logs of access, etc… so much data to geek out about.

(storage) ceph is amazing

If you haven’t tried out ceph yet, and are not yet completely satisfied with your onprem storage system, I recommend giving it a try.  (Note, it does want a lot of cpu, so heads up on that.)

** I acknowledge that I am currently excited and completely captivated by ceph.  I’m still fairly new to using ceph, so you might want to check my facts, a.k.a. I’m tempting you to start investigating. 😉

Ceph is able to use multiple disks on multiple servers and spread out the load, maintaining two or three copies of data to avoid data loss.  Just look at my humble 4 disk system with the data spread out nearly perfectly across the disks.  (2 nodes with 2 disks each)

Ceph provides cephfs (ceph filesystem), cephrbd (ceph block storage).

Block storage gives you something like a disk, it’s useful for say creating a vm that you later want to hot-swap between servers.

Otherwise, the ceph file system is what you want to use, though not directly.

You’ll end up creating a pool which is based on the ceph filesystem (cephfs), then create pvcs that come from the pool. Also you can use iscsi (which uses cephrbd) or nfs, (which uses cephfs), if you have a consumer that is not able to connect with cephfs or cephrbd directly.

In my experiences working with storage systems and kubernetes persistent volumes, I was not having luck with RWX (read write many), even when the providers claimed they worked (nfs using my own linux server, longhorn which uses nfs for rwx). I found two apps ‘plex’ and ‘nextcloud’ would consistently experience database corruption after only a few minutes.

People continually told me “NFS supports RWX” and “Just use iSCSI, that’s perfect for apps that use SQLite”. I tested these claims and did not get the same result using a TrueNAS Core server.

However, with ceph you can allocate a pvc using cephfs, and this works perfectly with RWX! Awesome! And super fast! All my jenkins builds which run in kubernetes sped up by 15 seconds vs TrueNAS iSCSI, of course … this could just mean the physical disks running with ceph are faster than the physical disks running with my truenas server, can’t be sure.

Now I suspect if I created pvcs using NFS and iSCSI, which work on top of cephfs, that they might also support RWX. I’ve very curious, but given how fast cephfs and that everything is working perfectly, I can’t see any reason to use NFS or iSCSI (except for vm disks).

Using the helm install of ceph you end up with a tools pod that you can exec into and run the ‘ceph’ cli. The dashboard gui is pretty great, but the real interaction with ceph happens at the command line level. I’ve been in there breaking & fixing things and the experience feels like a fully fledged product ready for production. There is so much there I can see someone managing ceph as a career with an available deep dive as far as you are interested in going. If you break it enough and then fix it you get to watch it moving data around recovering things which is as cool as can be to watch, from the most geeky perspective.

In any case, given how long cephfs has been around and the popularity it has in use in production environments, I think I’ve found my storage solution for the foreseeable future.

Ceph – Getting Started