feat(network): apply host firewall (inet host) during host provisioning#802
feat(network): apply host firewall (inet host) during host provisioning#802brunodam wants to merge 1 commit into
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
There was a problem hiding this comment.
Pull request overview
This PR wires the node-agnostic host firewall (inet host nftables table) into the kube cluster install host-provisioning workflow so it is applied once per machine (create-if-missing) before Kubernetes is brought up, and introduces the corresponding host-level config + CLI/prompt surface. It also centralizes the cluster pod CIDR into a single constant that is reused by kubeadm init template rendering and as the firewall’s default pod CIDR.
Changes:
- Add a new
NetworkFirewallCreateworkflow step (with rollback behavior) and register it insystemSetupWorkflow()right after enabling nftables. - Introduce
models.HostConfig(+ validation), global config override plumbing, and CLI flags + interactive prompts for management CIDRs / SSH port / pod CIDR / in-cluster ports. - Replace the kubeadm init template’s literal
podSubnetwith a rendered value sourced frommodels.DefaultClusterPodCIDR.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/software/kubeadm_installer.go | Passes models.DefaultClusterPodCIDR into kubeadm init template data. |
| pkg/models/host_config_test.go | Adds unit tests for HostConfig.Validate() and ensures Config.Validate() surfaces host config errors. |
| pkg/models/config.go | Adds DefaultClusterPodCIDR, HostConfig, and host config validation; adds Host to global Config. |
| pkg/config/global.go | Adds OverrideHostConfig to mutate global host firewall config. |
| internal/workflows/steps/step_network_firewall.go | New workflow step that applies the host firewall and rolls back only if it created the table. |
| internal/workflows/steps/step_network_firewall_test.go | Unit tests for skip/create/rollback behavior of the new workflow step using a stubbed firewall manager. |
| internal/workflows/steps/const.go | Adds FirewallCreatedByThisStep local-state key for rollback decisions. |
| internal/workflows/setup.go | Registers NetworkFirewallCreate() in systemSetupWorkflow() after nftables service setup. |
| internal/ui/prompt/cluster.go | Adds validators and prompts for mgmt CIDRs / SSH port / pod CIDR / in-cluster ports + port-list parsing. |
| internal/ui/prompt/cluster_test.go | Adds tests covering the new prompt validators and port list parsing. |
| internal/templates/files/kubeadm/kubeadm-init.yaml | Renders networking.podSubnet from template data instead of a literal. |
| internal/templates/embed.go | Adds PodSubnet to KubeadmInitData. |
| docs/quickstart.md | Documents host firewall behavior and adds new flags to the quickstart install examples and flags table. |
| cmd/cli/commands/kube/cluster/install.go | Adds flags and resolves effective host firewall config (flags/prompt/config/defaults) for use by the workflow step. |
| cmd/cli/commands/kube/cluster/cluster.go | Adds flag backing vars for host firewall options. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
89b0623 to
55a1294
Compare
6c3af2b to
c34e990
Compare
c34e990 to
4894db7
Compare
f529b0a to
d441dc7
Compare
…kflow Wire `network firewall create` into the block-node workflow (`block node install`, always; `internal/bll/blocknode/install_handler.go`) so every block node gets the node-level `inet host` nftables table (SSH/mgmt allowlist, ICMP policy, in-cluster host-service ports) applied before its Helm chart deploys. Create-if-missing; re-running install is a no-op. The firewall is deliberately NOT wired into the generic `kube cluster install` / `systemSetupWorkflow()` path: that command provisions a Kubernetes cluster independent of any specific node type, and unconditionally applying node-specific firewall rules there would be too invasive for deployments that use it for other purposes. The block-node workflow is the correct owner since it's the node type that actually needs this protection today. Add the management-CIDR config surface needed to apply it safely: a `host` config block (managementCidrs, sshPort, podCidr, inClusterPorts, disabled) plus --mgmt-cidrs / --ssh-port / --pod-cidr / --in-cluster-ports / --firewall-enabled flags with interactive prompts (pre-filled, confirm with Enter) for any value not passed on the CLI. The input chain is default-drop, so an empty management allowlist would lock the host out of SSH; the step skips firewall creation in that case rather than rendering a lock-out ruleset. --firewall-enabled=false lets an operator opt out entirely (e.g. hosts managed by an external firewall). `block node install` always resolves the host firewall config, regardless of whether it bootstraps a bare-metal host or deploys onto an already-existing cluster — nftables is guaranteed already installed in both cases (either from a prior `kube cluster install`, or from the cluster bootstrap this same install just ran), so it's always safe to apply here. Promote the cluster pod subnet to a single source of truth (models.DefaultClusterPodCIDR) shared by the kubeadm podSubnet template and the firewall --pod-cidr default, so the in-cluster host-service ports rule opens exactly the range kubeadm assigns pods. Closes #778 Closes #779 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Bruno Marques <bruno.marques@swirldslabs.com>
d441dc7 to
d84e03e
Compare
Summary
Wires
network firewall create(#757) into the block-node workflow so every block node gets the node-levelinet hostnftables table — SSH/management allowlist, ICMP policy, and in-cluster host-service ports — laid down before its Helm chart deploys. It is create-if-missing (re-running install is a no-op).This PR closes #778 and #779. #779 (the management-CIDR config surface) is folded in because the
inet hostinput chain is default-drop: applying it with an empty SSH allowlist would permanently lock a host out of new SSH connections. Since this branch targetsmaindirectly (no epic rollup branch), #778 must be independently safe there, which requires #779's allowlist plumbing.Design correction from the original approach
The first version of this PR wired the firewall step into the generic
systemSetupWorkflow()(shared bykube cluster installand bare-metalblock node install), following epic #777's original framing that this is "node-agnostic" host provisioning. Feedback surfaced that this is too invasive forkube cluster install, which is used as a general-purpose cluster-provisioning tool for varying deployments — forcing node-specific firewall rules onto every generic cluster install doesn't fit that use case. Having it live in the block-node (and, in future, consensus-node) workflow instead makes more sense, since that's the actual node type that needs the protection today.This version moves the wiring accordingly:
internal/workflows/setup.go—NetworkFirewallCreate()removed fromsystemSetupWorkflow()(still installs/enables nftables itself, since other things may depend on the package being present — just no longer applies theinet hosttable there).cmd/cli/commands/kube/cluster/install.go— host-firewall flags/resolution removed.kube cluster installno longer touches the firewall at all.internal/bll/blocknode/install_handler.go—NetworkFirewallCreate()prepended to both branches ofInstallHandler.BuildWorkflow(bare-metal bootstrap and deploy-onto-existing-cluster). Both branches are guaranteed to already have nftables installed by the time this step runs (either from a priorkube cluster install, or from the cluster bootstrap this sameblock node installjust ran).cmd/cli/commands/block/node/install.go— now always resolves the host firewall config (previously gated on "bootstrapping bare metal only" via astate.ReadClusterCreatedFromDisk()check, since the firewall was expected to already exist for the existing-cluster case; that gate and its now-dead reader are removed).The corresponding traffic-shaper design doc and epic #777 are being updated to match (tracked separately).
What changed
internal/workflows/steps/step_network_firewall.go):NetworkFirewallCreatestep. Reads the resolved host config, callsfirewall.NewManager().Create(...). Skips with a loud warning when no management CIDRs are configured rather than rendering a lock-out ruleset, and skips (info-level) when explicitly disabled via--firewall-enabled=false. Rollback deletes the table only if this step created it.pkg/models/config.go,pkg/config/global.go): newhostconfig block —managementCidrs,sshPort,podCidr,inClusterPorts,disabled— with validation, plusOverrideHostConfig(full-replace semantics, since the resolver already computes the complete effective state before calling it).cmd/cli/commands/common/host_firewall.go):--firewall-enabled,--mgmt-cidrs,--ssh-port,--pod-cidr,--in-cluster-ports, registered viaRegisterHostFirewallFlagsand resolved viaResolveHostFirewallConfig. Precedence per value: CLI flag > interactive prompt > config file > built-in default. Interactive installs prompt (pre-filled, confirm with Enter) for any value not passed on the CLI;--non-interactive/ non-TTY skips prompting.--firewall-enabled=falseopts out entirely and skips resolving the other fields.block node installwired: always resolves and applies, regardless of bare-metal vs. existing-cluster.10.4.0.0/14tomodels.DefaultClusterPodCIDR, now consumed by both the kubeadmpodSubnettemplate (KubeadmInitData.PodSubnet) and the firewall--pod-cidrdefault — so the in-cluster host-service ports rule opens exactly the range kubeadm assigns pods (the per-node CIDR can't be auto-detected before the cluster exists).Scope boundaries (epic #777)
inet weaverworkload plane (Story 2.3 —block node installorchestrates the static plane vianetwork … create#762), teardown onkube cluster uninstall(Story 0.4 — Host firewall teardown onkube cluster uninstall(node-agnostic) #791, itself likely needs the same block-node-vs-cluster ownership correction). Reboot persistence ofnetwork-host.nft+ the shared oneshot unit is already handled by Story 1.1 — Implementnetwork firewallverbs on theinet hosttable #757'sCreatepath; Story 0.3 —inet hostreboot persistence (network-host.nft + oneshot) #780 refines it.block node reconfigure/upgrade(opt-in, not automatic) so hosts already running a pre-feat(network): apply host firewall (inet host) during host provisioning #802-era binary can retrofit the firewall without a fresh install.Test plan
task lint:check— 0 issuesgo build/go vet(GOOS=linux) — clean across all touched packagesgo test ./pkg/models/... ./internal/ui/prompt/... ./pkg/config/... ./internal/state/...HostConfig.Validatecases; prompt validators (mgmt-cidrs/ssh-port/pod-cidr/in-cluster-ports) +ParsePortList;OverrideHostConfigfull-replace semanticstask vm:test:unitcoveringinternal/workflows/steps— new step tests (skip-on-empty, skip-on-disabled, create-on-set + rollback-deletes, rollback-skips-when-preexisting, explicit-empty overrides defaults)task vm:test:integration— cluster-install / block-node-install pathManual UAT (host with real nftables)
Run on a provisioning host (or the UTM VM). Keep a second SSH session open as a safety net before applying default-drop.
1. Happy path — firewall applied with a management allowlist, on top of an existing cluster
Then inspect the live table:
Expected (abridged) — SSH allow scoped to the allowlist, the in-cluster-ports rule for the pod CIDR, and
policy drop:Confirm the on-disk artifact and the boot unit exist, and that SSH from a management source still connects:
2.
kube cluster installalone never touches the firewallExpected:
Error: No such file or directory— the table is never created bykube cluster installon its own, confirming the ownership move.3. Bare-metal bootstrap via
block node installOn a machine with no cluster yet:
sudo solo-provisioner block node install --profile=local \ --non-interactive --mgmt-cidrs=10.0.0.0/8 sudo nft list table inet host # same ruleset as scenario 1Expected: the bootstrap path (cluster + block node in one command) applies the firewall.
4. Idempotency — re-running install is a no-op
Expected: a warning that the table already exists and flags were not re-applied (create-if-missing); the step reports success without changing the table.
5. Skip path — no allowlist, no lockout
Expected: the step is skipped with a warning, and no table is created:
6. Explicit opt-out
Expected: the step is skipped (info-level, not a warning — this is an intentional choice, not a safety fallback), and no table is created even though
--mgmt-cidrswas supplied.7. Interactive prompt — values pre-filled, confirm with Enter
Run without
--non-interactiveand without the firewall flags on a TTY:Expected: four prompts appear (Management CIDRs, SSH port
22, Pod CIDR10.4.0.0/14, In-cluster ports6443,4244,7472,10250), each pre-filled and acceptable with Enter; a "Host Firewall" summary of the chosen values is printed before the workflow runs.8. kubeadm podSubnet unchanged
grep podSubnet /opt/solo-provisioner/.../etc/weaver/kubeadm-init.yaml # -> podSubnet: 10.4.0.0/14Expected: identical to the previous literal — the constant change is behavior-neutral.
Risks / rollback
--mgmt-cidrs→ step skips (no drop policy applied); the SSH allow rule and the drop policy commit in one atomicnft -ftransaction (no window).InstallClusterWorkflow's cluster bootstrap but beforeSetupBlockNode. Loopback and established/related are accepted, so single-node bootstrap (predominantly loopback) is unaffected. Multi-node control-plane/worker traffic to6443originates from node IPs, so those node subnets must be included in--mgmt-cidrs(the in-cluster-ports rule opens the pod CIDR only). Reviewer attention welcome here.podSubnetnow renders frommodels.DefaultClusterPodCIDR(value identical to the prior literal10.4.0.0/14); behavior-neutral. Startup migrations do not re-render kubeadm config, so existing clusters are unaffected.install_handler.gostep insertion; the step's own rollback deletes a table it created. Additive otherwise — no existing path invoked it before.Closes #778
Closes #779