Skip to content

feat(network): apply host firewall (inet host) during host provisioning#802

Open
brunodam wants to merge 1 commit into
mainfrom
00778-wire-firewall-create-kube-install
Open

feat(network): apply host firewall (inet host) during host provisioning#802
brunodam wants to merge 1 commit into
mainfrom
00778-wire-firewall-create-kube-install

Conversation

@brunodam

@brunodam brunodam commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Wires network firewall create (#757) into the block-node workflow so every block node gets the node-level inet host nftables table — SSH/management allowlist, ICMP policy, and in-cluster host-service ports — laid down before its Helm chart deploys. It is create-if-missing (re-running install is a no-op).

This PR closes #778 and #779. #779 (the management-CIDR config surface) is folded in because the inet host input chain is default-drop: applying it with an empty SSH allowlist would permanently lock a host out of new SSH connections. Since this branch targets main directly (no epic rollup branch), #778 must be independently safe there, which requires #779's allowlist plumbing.

Design correction from the original approach

The first version of this PR wired the firewall step into the generic systemSetupWorkflow() (shared by kube cluster install and bare-metal block node install), following epic #777's original framing that this is "node-agnostic" host provisioning. Feedback surfaced that this is too invasive for kube cluster install, which is used as a general-purpose cluster-provisioning tool for varying deployments — forcing node-specific firewall rules onto every generic cluster install doesn't fit that use case. Having it live in the block-node (and, in future, consensus-node) workflow instead makes more sense, since that's the actual node type that needs the protection today.

This version moves the wiring accordingly:

  • internal/workflows/setup.goNetworkFirewallCreate() removed from systemSetupWorkflow() (still installs/enables nftables itself, since other things may depend on the package being present — just no longer applies the inet host table there).
  • cmd/cli/commands/kube/cluster/install.go — host-firewall flags/resolution removed. kube cluster install no longer touches the firewall at all.
  • internal/bll/blocknode/install_handler.goNetworkFirewallCreate() prepended to both branches of InstallHandler.BuildWorkflow (bare-metal bootstrap and deploy-onto-existing-cluster). Both branches are guaranteed to already have nftables installed by the time this step runs (either from a prior kube cluster install, or from the cluster bootstrap this same block node install just ran).
  • cmd/cli/commands/block/node/install.go — now always resolves the host firewall config (previously gated on "bootstrapping bare metal only" via a state.ReadClusterCreatedFromDisk() check, since the firewall was expected to already exist for the existing-cluster case; that gate and its now-dead reader are removed).

The corresponding traffic-shaper design doc and epic #777 are being updated to match (tracked separately).

What changed

  • Host firewall step (internal/workflows/steps/step_network_firewall.go): NetworkFirewallCreate step. Reads the resolved host config, calls firewall.NewManager().Create(...). Skips with a loud warning when no management CIDRs are configured rather than rendering a lock-out ruleset, and skips (info-level) when explicitly disabled via --firewall-enabled=false. Rollback deletes the table only if this step created it.
  • Config surface (pkg/models/config.go, pkg/config/global.go): new host config block — managementCidrs, sshPort, podCidr, inClusterPorts, disabled — with validation, plus OverrideHostConfig (full-replace semantics, since the resolver already computes the complete effective state before calling it).
  • CLI flags + prompts + shared resolver (cmd/cli/commands/common/host_firewall.go): --firewall-enabled, --mgmt-cidrs, --ssh-port, --pod-cidr, --in-cluster-ports, registered via RegisterHostFirewallFlags and resolved via ResolveHostFirewallConfig. Precedence per value: CLI flag > interactive prompt > config file > built-in default. Interactive installs prompt (pre-filled, confirm with Enter) for any value not passed on the CLI; --non-interactive / non-TTY skips prompting. --firewall-enabled=false opts out entirely and skips resolving the other fields.
  • block node install wired: always resolves and applies, regardless of bare-metal vs. existing-cluster.
  • Single source of truth for the pod subnet: promoted 10.4.0.0/14 to models.DefaultClusterPodCIDR, now consumed by both the kubeadm podSubnet template (KubeadmInitData.PodSubnet) and the firewall --pod-cidr default — so the in-cluster host-service ports rule opens exactly the range kubeadm assigns pods (the per-node CIDR can't be auto-detected before the cluster exists).

Scope boundaries (epic #777)

Test plan

  • task lint:check — 0 issues
  • go build / go vet (GOOS=linux) — clean across all touched packages
  • Unit (macOS): go test ./pkg/models/... ./internal/ui/prompt/... ./pkg/config/... ./internal/state/...
    • HostConfig.Validate cases; prompt validators (mgmt-cidrs / ssh-port / pod-cidr / in-cluster-ports) + ParsePortList; OverrideHostConfig full-replace semantics
  • Unit (VM, Linux-only pkg): task vm:test:unit covering internal/workflows/steps — new step tests (skip-on-empty, skip-on-disabled, create-on-set + rollback-deletes, rollback-skips-when-preexisting, explicit-empty overrides defaults)
  • Integration (VM): task vm:test:integration — cluster-install / block-node-install path
  • Manual UAT on a host with real nft (see step-by-step below)

Manual UAT (host with real nftables)

Run on a provisioning host (or the UTM VM). Keep a second SSH session open as a safety net before applying default-drop.

1. Happy path — firewall applied with a management allowlist, on top of an existing cluster

sudo solo-provisioner kube cluster install --profile=local --node-type=block --non-interactive
sudo solo-provisioner block node install --profile=local \
  --non-interactive --mgmt-cidrs=10.0.0.0/8 --pod-cidr=10.4.0.0/14

Then inspect the live table:

sudo nft list table inet host

Expected (abridged) — SSH allow scoped to the allowlist, the in-cluster-ports rule for the pod CIDR, and policy drop:

table inet host {
        set mgmt_addrs { type ipv4_addr; flags interval; elements = { 10.0.0.0/8 } }
        set in_cluster_ports { type inet_service; elements = { 4244, 6443, 7472, 10250 } }
        chain input {
                type filter hook input priority filter; policy drop;
                ...
                ip saddr @mgmt_addrs tcp dport 22 accept
                ip saddr 10.4.0.0/14 tcp dport @in_cluster_ports accept
        }
}

Confirm the on-disk artifact and the boot unit exist, and that SSH from a management source still connects:

cat /etc/solo-provisioner/network-host.nft        # same ruleset as above
systemctl is-enabled solo-provisioner-network-nft.service   # -> enabled

2. kube cluster install alone never touches the firewall

sudo solo-provisioner kube cluster install --profile=local --node-type=block --non-interactive
sudo nft list table inet host

Expected: Error: No such file or directory — the table is never created by kube cluster install on its own, confirming the ownership move.

3. Bare-metal bootstrap via block node install

On a machine with no cluster yet:

sudo solo-provisioner block node install --profile=local \
  --non-interactive --mgmt-cidrs=10.0.0.0/8
sudo nft list table inet host     # same ruleset as scenario 1

Expected: the bootstrap path (cluster + block node in one command) applies the firewall.

4. Idempotency — re-running install is a no-op

sudo solo-provisioner block node install --profile=local \
  --non-interactive --mgmt-cidrs=10.0.0.0/8

Expected: a warning that the table already exists and flags were not re-applied (create-if-missing); the step reports success without changing the table.

5. Skip path — no allowlist, no lockout

sudo solo-provisioner block node install --profile=local --non-interactive

Expected: the step is skipped with a warning, and no table is created:

WARN host firewall not applied: no management CIDRs configured ...
$ sudo nft list table inet host
Error: No such file or directory   # table was never created -> host not locked out

6. Explicit opt-out

sudo solo-provisioner block node install --profile=local \
  --non-interactive --mgmt-cidrs=10.0.0.0/8 --firewall-enabled=false

Expected: the step is skipped (info-level, not a warning — this is an intentional choice, not a safety fallback), and no table is created even though --mgmt-cidrs was supplied.

7. Interactive prompt — values pre-filled, confirm with Enter

Run without --non-interactive and without the firewall flags on a TTY:

sudo solo-provisioner block node install --profile=local

Expected: four prompts appear (Management CIDRs, SSH port 22, Pod CIDR 10.4.0.0/14, In-cluster ports 6443,4244,7472,10250), each pre-filled and acceptable with Enter; a "Host Firewall" summary of the chosen values is printed before the workflow runs.

8. kubeadm podSubnet unchanged

grep podSubnet /opt/solo-provisioner/.../etc/weaver/kubeadm-init.yaml   # -> podSubnet: 10.4.0.0/14

Expected: identical to the previous literal — the constant change is behavior-neutral.

Risks / rollback

  • SSH lockout — mitigated: empty --mgmt-cidrs → step skips (no drop policy applied); the SSH allow rule and the drop policy commit in one atomic nft -f transaction (no window).
  • Default-drop active before kubeadm init (bare-metal bootstrap path) — the step runs after InstallClusterWorkflow's cluster bootstrap but before SetupBlockNode. Loopback and established/related are accepted, so single-node bootstrap (predominantly loopback) is unaffected. Multi-node control-plane/worker traffic to 6443 originates from node IPs, so those node subnets must be included in --mgmt-cidrs (the in-cluster-ports rule opens the pod CIDR only). Reviewer attention welcome here.
  • kubeadm template changepodSubnet now renders from models.DefaultClusterPodCIDR (value identical to the prior literal 10.4.0.0/14); behavior-neutral. Startup migrations do not re-render kubeadm config, so existing clusters are unaffected.
  • Rollback: revert the install_handler.go step insertion; the step's own rollback deletes a table it created. Additive otherwise — no existing path invoked it before.

Closes #778
Closes #779

Copilot AI review requested due to automatic review settings July 2, 2026 21:28
@brunodam brunodam requested a review from a team as a code owner July 2, 2026 21:28
@brunodam brunodam requested a review from JeffreyDallas July 2, 2026 21:28
@swirlds-automation

swirlds-automation commented Jul 2, 2026

Copy link
Copy Markdown

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR wires the node-agnostic host firewall (inet host nftables table) into the kube cluster install host-provisioning workflow so it is applied once per machine (create-if-missing) before Kubernetes is brought up, and introduces the corresponding host-level config + CLI/prompt surface. It also centralizes the cluster pod CIDR into a single constant that is reused by kubeadm init template rendering and as the firewall’s default pod CIDR.

Changes:

  • Add a new NetworkFirewallCreate workflow step (with rollback behavior) and register it in systemSetupWorkflow() right after enabling nftables.
  • Introduce models.HostConfig (+ validation), global config override plumbing, and CLI flags + interactive prompts for management CIDRs / SSH port / pod CIDR / in-cluster ports.
  • Replace the kubeadm init template’s literal podSubnet with a rendered value sourced from models.DefaultClusterPodCIDR.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/software/kubeadm_installer.go Passes models.DefaultClusterPodCIDR into kubeadm init template data.
pkg/models/host_config_test.go Adds unit tests for HostConfig.Validate() and ensures Config.Validate() surfaces host config errors.
pkg/models/config.go Adds DefaultClusterPodCIDR, HostConfig, and host config validation; adds Host to global Config.
pkg/config/global.go Adds OverrideHostConfig to mutate global host firewall config.
internal/workflows/steps/step_network_firewall.go New workflow step that applies the host firewall and rolls back only if it created the table.
internal/workflows/steps/step_network_firewall_test.go Unit tests for skip/create/rollback behavior of the new workflow step using a stubbed firewall manager.
internal/workflows/steps/const.go Adds FirewallCreatedByThisStep local-state key for rollback decisions.
internal/workflows/setup.go Registers NetworkFirewallCreate() in systemSetupWorkflow() after nftables service setup.
internal/ui/prompt/cluster.go Adds validators and prompts for mgmt CIDRs / SSH port / pod CIDR / in-cluster ports + port-list parsing.
internal/ui/prompt/cluster_test.go Adds tests covering the new prompt validators and port list parsing.
internal/templates/files/kubeadm/kubeadm-init.yaml Renders networking.podSubnet from template data instead of a literal.
internal/templates/embed.go Adds PodSubnet to KubeadmInitData.
docs/quickstart.md Documents host firewall behavior and adds new flags to the quickstart install examples and flags table.
cmd/cli/commands/kube/cluster/install.go Adds flags and resolves effective host firewall config (flags/prompt/config/defaults) for use by the workflow step.
cmd/cli/commands/kube/cluster/cluster.go Adds flag backing vars for host firewall options.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/cli/commands/kube/cluster/install.go Outdated
Comment thread cmd/cli/commands/kube/cluster/install.go Outdated
Comment thread internal/workflows/steps/step_network_firewall.go Outdated
@brunodam brunodam force-pushed the 00778-wire-firewall-create-kube-install branch from 89b0623 to 55a1294 Compare July 2, 2026 21:48
@brunodam brunodam changed the title feat(network): apply host firewall (inet host) during kube cluster install feat(network): apply host firewall (inet host) during host provisioning Jul 2, 2026
@brunodam brunodam force-pushed the 00778-wire-firewall-create-kube-install branch 2 times, most recently from 6c3af2b to c34e990 Compare July 3, 2026 01:41
@brunodam brunodam force-pushed the 00778-wire-firewall-create-kube-install branch from c34e990 to 4894db7 Compare July 3, 2026 04:03
@brunodam brunodam force-pushed the 00778-wire-firewall-create-kube-install branch 2 times, most recently from f529b0a to d441dc7 Compare July 3, 2026 04:20
…kflow

Wire `network firewall create` into the block-node workflow (`block node
install`, always; `internal/bll/blocknode/install_handler.go`) so every block
node gets the node-level `inet host` nftables table (SSH/mgmt allowlist, ICMP
policy, in-cluster host-service ports) applied before its Helm chart deploys.
Create-if-missing; re-running install is a no-op.

The firewall is deliberately NOT wired into the generic `kube cluster install`
/ `systemSetupWorkflow()` path: that command provisions a Kubernetes cluster
independent of any specific node type, and unconditionally applying
node-specific firewall rules there would be too invasive for deployments that
use it for other purposes. The block-node workflow is the correct owner since
it's the node type that actually needs this protection today.

Add the management-CIDR config surface needed to apply it safely: a `host`
config block (managementCidrs, sshPort, podCidr, inClusterPorts, disabled)
plus --mgmt-cidrs / --ssh-port / --pod-cidr / --in-cluster-ports /
--firewall-enabled flags with interactive prompts (pre-filled, confirm with
Enter) for any value not passed on the CLI. The input chain is default-drop,
so an empty management allowlist would lock the host out of SSH; the step
skips firewall creation in that case rather than rendering a lock-out
ruleset. --firewall-enabled=false lets an operator opt out entirely (e.g.
hosts managed by an external firewall).

`block node install` always resolves the host firewall config, regardless of
whether it bootstraps a bare-metal host or deploys onto an already-existing
cluster — nftables is guaranteed already installed in both cases (either from
a prior `kube cluster install`, or from the cluster bootstrap this same
install just ran), so it's always safe to apply here.

Promote the cluster pod subnet to a single source of truth
(models.DefaultClusterPodCIDR) shared by the kubeadm podSubnet template and
the firewall --pod-cidr default, so the in-cluster host-service ports rule
opens exactly the range kubeadm assigns pods.

Closes #778
Closes #779

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Bruno Marques <bruno.marques@swirldslabs.com>
@brunodam brunodam force-pushed the 00778-wire-firewall-create-kube-install branch from d441dc7 to d84e03e Compare July 3, 2026 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants