This guide shows how to build a 3-node Kubernetes cluster on AWS EC2 where:
- All 3 nodes are both control-plane and worker (stacked etcd).
- Nodes live only in a private subnet (no public IPs).
- An existing OPNsense instance in the same VPC acts as:
- Default gateway / NAT
- Firewall
- Load balancer (HAProxy) for:
- Kubernetes API (internal only)
- Traefik ingress (public apps)
- DNS server (BIND) for internal names
We do not use:
- EKS (no managed control-plane cost)
- AWS NLB / ALB (no LB hourly + data charges)
- MetalLB inside the cluster
Instead, we treat OPNsense as the “edge” load balancer in front of a private kubeadm cluster.
This is still a lab: OPNsense is a single point of ingress, but the internal Kubernetes design is very close to a production cluster.
Why not MetalLB or kube-vip on AWS?
In my bare-metal labs, I usually use:
- kube-vip for the Kubernetes API VIP, and
- MetalLB to implement
type: LoadBalancerServices.
On AWS in this design, we don’t use either, for a few reasons:
- MetalLB + AWS VPC is awkward
MetalLB’s simple Layer-2 mode relies on answering ARP/NDP for arbitrary IPs on the LAN. In an AWS VPC, you don’t control L2 like that — IP ownership is tied to ENIs and routing tables, not raw ARP. MetalLB can run in BGP mode, but that requires a BGP-capable router setup that we’re not doing here. - OPNsense is already our “cloud load balancer”
OPNsense terminates the public EIP, does NAT, runs HAProxy, and sits at10.0.128.4on the LAN. It already:- load balances the Kubernetes API across
10.0.128.7/8/9, and - load balances Traefik’s NodePorts (30080/30443) for app traffic.
Adding MetalLB inside the cluster would just create a second, unnecessary LB layer.
- load balances the Kubernetes API across
- kube-vip would duplicate what HAProxy already does
kube-vip is mainly used to provide a virtual IP for the control-plane nodes. In this setup, the control-plane VIP is effectively:k8s-aws.maksonlee.com → 10.0.128.4 (OPNsense) → 6443 on all 3 nodes
That’s already highly available at the edge. Putting kube-vip inside the cluster would add extra moving parts without improving availability for this AWS + OPNsense design.
The result is: the cluster stays simple (kubeadm + Calico + Traefik), and OPNsense plays the role that MetalLB/kube-vip normally play in a bare-metal lab.
- Lab Topology
Assumed AWS environment
You already have:
- VPC:
10.0.0.0/16 - Public subnet (example):
10.0.0.0/20 - OPNsense (firewall / LB)
- WAN IP:
10.0.0.4 - Elastic IP (EIP):
3.109.96.219(public)
- WAN IP:
- Private subnet:
10.0.128.0/20- OPNsense LAN IP:
10.0.128.4
- OPNsense LAN IP:
Routing:
- The route table for
10.0.128.0/20sends0.0.0.0/0to OPNsense as an instance target, with source/destination check disabled on that EC2 instance. - OPNsense does NAT for instances in
10.0.128.0/20.
OPNsense provides:
- Firewall
- BIND DNS
- HAProxy (SSL offload and/or TCP passthrough for services)
Kubernetes Nodes & Hostnames
We’ll run three EC2 instances in the private subnet:
| Hostname | IP | Role |
|---|---|---|
| k8s-aws-1.maksonlee.com | 10.0.128.7 | control-plane + worker |
| k8s-aws-2.maksonlee.com | 10.0.128.8 | control-plane + worker |
| k8s-aws-3.maksonlee.com | 10.0.128.9 | control-plane + worker |
We’ll also use these logical names:
- Kubernetes API VIP (internal only)
k8s-aws.maksonlee.com → 10.0.128.4(OPNsense LAN IP)
- Ingress hostnames (apps exposed via Traefik NodePort through HAProxy):
app1-aws.maksonlee.comapp2-aws.maksonlee.com
From the Internet:
app1-aws.maksonlee.com,app2-aws.maksonlee.com→ EIP3.109.96.219→ OPNsense WAN
From inside the VPC / VPN / internal clients:
k8s-aws.maksonlee.com→10.0.128.4app1-aws.maksonlee.com,app2-aws.maksonlee.com→10.0.128.4- Nodes →
10.0.128.7/8/9
Important: Kubernetes API is not exposed on a public DNS name. k8s-aws.maksonlee.com only exists in the internal DNS view.
- DNS Plan (BIND on OPNsense + Public DNS)
Internal zone (maksonlee.com, BIND on OPNsense)
In your internal maksonlee.com zone:
; OPNsense LAN – API VIP & ingress VIP inside VPC
k8s-aws IN A 10.0.128.4
; Kubernetes nodes
k8s-aws-1 IN A 10.0.128.7
k8s-aws-2 IN A 10.0.128.8
k8s-aws-3 IN A 10.0.128.9
; App hostnames – also go to OPNsense LAN for internal clients
app1-aws IN A 10.0.128.4
app2-aws IN A 10.0.128.4
So from VPC / VPN:
k8s-aws.maksonlee.com→10.0.128.4app1-aws.maksonlee.com,app2-aws.maksonlee.com→10.0.128.4
Public DNS (Cloudflare)
On public DNS, you only expose app hostnames:
; No public record for k8s-aws (API stays internal)
app1-aws IN A 3.109.96.219
app2-aws IN A 3.109.96.219
From the Internet:
- Users hit
app1-aws.maksonlee.com/app2-aws.maksonlee.com→ EIP3.109.96.219→ OPNsense → HAProxy → Traefik NodePort → apps.
- Launch the Three Kubernetes Nodes (t4g.small)
Use ARM-based Graviton instances:
- Instance type:
t4g.small- 2 vCPUs
- 2 GiB RAM
- EBS only
- Up to 5 Gbps network bandwidth
- AMI: Ubuntu Server 24.04 LTS (ARM64)
- Subnet:
10.0.128.0/20(private) - Auto-assign public IP: Disabled
Set private IPs:
k8s-aws-1.maksonlee.com→10.0.128.7k8s-aws-2.maksonlee.com→10.0.128.8k8s-aws-3.maksonlee.com→10.0.128.9
Set hostnames:
# On k8s-aws-1
sudo hostnamectl set-hostname k8s-aws-1.maksonlee.com
# On k8s-aws-2
sudo hostnamectl set-hostname k8s-aws-2.maksonlee.com
# On k8s-aws-3
sudo hostnamectl set-hostname k8s-aws-3.maksonlee.com
Security group for Kubernetes nodes
For this lab, all the instances (OPNsense and the three Kubernetes nodes) use the same default security group.
The default SG is configured as:
- Inbound:
All protocols / All portsfrom the same security group. - Outbound:
All protocols / All portsto0.0.0.0/0.
This means:
- Nodes can talk to each other on all ports (required for kubelet / etcd / Calico).
- OPNsense (same SG) can reach:
6443on all nodes (Kubernetes API)- NodePort range including
30080and30443(Traefik)
- Nothing outside this SG can directly reach the nodes.
Because the nodes are in a private subnet and have no public IPs, they are still only reachable via OPNsense, so this is acceptable for a lab.
If you want a stricter, more production-like setup, you could:
- Create a dedicated SG for the nodes.
- Allow:
- TCP
6443from the OPNsense SG (for API) - TCP
30080and30443from the OPNsense SG (for Traefik NodePorts) - All traffic within the node SG itself.
- TCP
- Prepare All Nodes for Kubernetes
Run this on all three nodes (k8s-aws-1, k8s-aws-2, k8s-aws-3).
- Disable swap & configure kernel
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
- Install containerd
sudo apt update && sudo apt install -y ca-certificates curl gnupg lsb-release
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update && sudo apt install -y containerd.io
- Configure containerd for SystemdCgroup
sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml >/dev/null
Edit:
sudo vi /etc/containerd/config.toml
Find:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = false
Change to:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
Restart:
sudo systemctl restart containerd
sudo systemctl enable containerd
- Install kubeadm, kubelet, kubectl (v1.34)
Add Kubernetes apt repo (v1.34)
sudo apt update
sudo apt install -y apt-transport-https ca-certificates curl gpg
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.34/deb/Release.key \
| sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.34/deb/ /
EOF
Install tools
sudo apt update
sudo apt install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
sudo systemctl enable --now kubelet
- HAProxy on OPNsense for API and Traefik
On OPNsense, you already have frontends/backends for other services (maksonlee.com, keycloak, etc.).
This section shows only the Kubernetes-related part of the config.
frontend frontend-https
bind 0.0.0.0:443 name 0.0.0.0:443
mode tcp
tcp-request inspect-delay 5s
tcp-request content accept if { req.ssl_hello_type 1 }
acl is_k8s_traefik_tls req.ssl_sni -i app1-aws.maksonlee.com
acl is_k8s_traefik_tls req.ssl_sni -i app2-aws.maksonlee.com
use_backend k8s_traefik_https if is_k8s_traefik_tls
frontend frontend-http
bind 0.0.0.0:80 name 0.0.0.0:80
mode http
option http-keep-alive
acl is_k8s_traefik_http hdr(host) -i app1-aws.maksonlee.com
acl is_k8s_traefik_http hdr(host) -i app2-aws.maksonlee.com
use_backend k8s_traefik_http if is_k8s_traefik_http
frontend frontend-k8s-api
bind 10.0.128.4:6443 name 10.0.128.4:6443
mode tcp
default_backend k8s_api
backend k8s_api
mode tcp
balance roundrobin
stick-table type ip size 50k expire 30m
server k8s-aws-1-api 10.0.128.7:6443
server k8s-aws-2-api 10.0.128.8:6443
server k8s-aws-3-api 10.0.128.9:6443
backend k8s_traefik_http
mode http
balance roundrobin
stick-table type ip size 50k expire 30m
http-reuse safe
server k8s-aws-1-traefik-http 10.0.128.7:30080
server k8s-aws-2-traefik-http 10.0.128.8:30080
server k8s-aws-3-traefik-http 10.0.128.9:30080
backend k8s_traefik_https
mode tcp
balance roundrobin
stick-table type ip size 50k expire 30m
server k8s-aws-1-traefik-https 10.0.128.7:30443
server k8s-aws-2-traefik-https 10.0.128.8:30443
server k8s-aws-3-traefik-https 10.0.128.9:30443
Explanation:
frontend-https(TCP, port 443):
Peeks at TLS SNI and sendsapp1-aws.maksonlee.com/app2-aws.maksonlee.comtok8s_traefik_https(TCP passthrough to Traefik’swebsecureNodePort, 30443).frontend-http(HTTP, port 80):
Routes byHostheader and sends those same hosts tok8s_traefik_http(NodePort 30080).frontend-k8s-api(TCP,10.0.128.4:6443):
Internal API VIP →k8s_apibackend, round-robin across all 3 control-planes.
Your existing frontends/backends for other apps remain unchanged; you just add these Kubernetes sections.
- kubeadm Init on k8s-aws-1
On k8s-aws-1, create kubeadm-config.yaml:
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
clusterName: aws-selfmanaged
controlPlaneEndpoint: "k8s-aws.maksonlee.com:6443"
apiServer:
certSANs:
- k8s-aws.maksonlee.com
- k8s-aws-1.maksonlee.com
- k8s-aws-2.maksonlee.com
- k8s-aws-3.maksonlee.com
networking:
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
Init the cluster:
sudo kubeadm init --config kubeadm-config.yaml --upload-certs
Configure kubectl on k8s-aws-1:
mkdir -p $HOME/.kube
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown "$(id -u):$(id -g)" $HOME/.kube/config
- Install Calico
On k8s-aws-1:
curl -LO https://raw.githubusercontent.com/projectcalico/calico/v3.31.2/manifests/calico.yaml
kubectl apply -f calico.yaml
You do not need to edit calico.yaml for the pod CIDR:
- In this manifest, the
CALICO_IPV4POOL_CIDRblock is commented out. - With kubeadm, Calico auto-detects the pod CIDR from the cluster configuration (
podSubnet: 10.244.0.0/16).
Verify:
kubectl get pods -n kube-system
Wait until all calico-node pods are Running.
- Join k8s-aws-2 and k8s-aws-3 as Control-Planes
After kubeadm init --upload-certs, kubeadm prints two kubeadm join commands:
- one for worker nodes
- one for additional control-plane nodes, which already includes
--control-planeand--certificate-key <CERT_KEY>
On k8s-aws-2 and k8s-aws-3, run the control-plane join command that kubeadm printed. It looks like:
sudo kubeadm join k8s-aws.maksonlee.com:6443 \
--token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<CA_HASH> \
--control-plane \
--certificate-key <CERT_KEY>
Check all nodes:
kubectl get nodes -o wide
You should see three control-plane nodes with internal IPs 10.0.128.7, .8, .9.
Allow Control-Planes to Run Workloads
For this lab, you want all three nodes to schedule workloads:
kubectl taint nodes k8s-aws-1.maksonlee.com node-role.kubernetes.io/control-plane-
kubectl taint nodes k8s-aws-2.maksonlee.com node-role.kubernetes.io/control-plane-
kubectl taint nodes k8s-aws-3.maksonlee.com node-role.kubernetes.io/control-plane-
- Install Traefik via Helm
On k8s-aws-1:
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm repo add traefik https://traefik.github.io/charts
helm repo update
Create traefik-values.yaml:
deployment:
replicas: 3
service:
type: NodePort
spec:
externalTrafficPolicy: Local
ports:
web:
port: 80
nodePort: 30080
websecure:
port: 443
nodePort: 30443
Install / upgrade Traefik:
helm upgrade --install traefik traefik/traefik \
--namespace traefik --create-namespace \
-f traefik-values.yaml
Check pods & svc:
kubectl get pods -n traefik -o wide
kubectl get svc -n traefik traefik
- Deploy a Sample App (whoami) and Ingress
Create namespace + deployment + service:
kubectl create namespace demo
kubectl create deployment whoami \
--namespace demo \
--image traefik/whoami \
--replicas 3
kubectl expose deployment whoami \
--namespace demo \
--port 80 --target-port 80
Create whoami-ingress.yaml:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: whoami
namespace: demo
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
ingressClassName: traefik
rules:
- host: app1-aws.maksonlee.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: whoami
port:
number: 80
Apply:
kubectl apply -f whoami-ingress.yaml
- Test from inside the cluster
On k8s-aws-1:
kubectl get svc -n traefik traefik
# Suppose ClusterIP is 10.103.26.76
# Test via ClusterIP:
curl -H "Host: app1-aws.maksonlee.com" http://10.103.26.76/
# Test via NodePort on each node:
curl -H "Host: app1-aws.maksonlee.com" http://10.0.128.7:30080/
curl -H "Host: app1-aws.maksonlee.com" http://10.0.128.8:30080/
curl -H "Host: app1-aws.maksonlee.com" http://10.0.128.9:30080/
You should see the whoami response each time.
- Test from OPNsense (direct to NodePort)
On the OPNsense shell:
curl -v -H "Host: app1-aws.maksonlee.com" http://10.0.128.7:30080/
If you get a whoami response, it confirms:
- NodePort is reachable
- Security Group rules are correct
- K8s / Traefik / Ingress path is healthy
- End-to-end test from your PC (through HAProxy)
From your PC:
curl http://app1-aws.maksonlee.com/
You should see something like:
Hostname: whoami-b85fc56b4-6mg8j
IP: 127.0.0.1
IP: ::1
IP: 10.244.166.198
...
X-Forwarded-For: 10.0.128.4
X-Forwarded-Host: app1-aws.maksonlee.com
X-Forwarded-Port: 80
X-Forwarded-Proto: http
X-Forwarded-Server: traefik-65cc567666-xxxxx
X-Real-Ip: 10.0.128.4
This shows:
- External client → EIP (
3.109.96.219) → OPNsense HAProxy - HAProxy frontend matches
Host: app1-aws.maksonlee.com→k8s_traefik_httpbackend - Backend load balances across the three NodePorts (
:30080) - Traefik forwards to the
whoamipods
Did this guide save you time?
Support this site