Skip to content

fix(libutil): don't over-encode sub-delims in URL paths#15510

Draft
lovesegfault wants to merge 3 commits intomasterfrom
fix-functional-file-urls
Draft

fix(libutil): don't over-encode sub-delims in URL paths#15510
lovesegfault wants to merge 3 commits intomasterfrom
fix-functional-file-urls

Conversation

@lovesegfault
Copy link
Copy Markdown
Member

@lovesegfault lovesegfault commented Mar 18, 2026

Motivation

encodeUrlPath encodes everything outside unreserved + ":@", which percent-encodes sub-delims like + even though RFC 3986 §3.3 allows them unencoded in path segments:

pchar      = unreserved / pct-encoded / sub-delims / ":" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

So parseURL("file:///a+b").to_string() produces file:///a%2Bb. Both are valid URLs, but every consumer that strips file:// and treats the remainder as a filesystem path then needs an explicit percentDecode224c818 and c8e56b3 each fixed one such site. Downstream, nixpkgs had to use .pN instead of +N in patched version strings to keep build tmpdir paths URL-safe.

Fix

Use boost::urls::pchars — exactly the RFC charset, already available via the existing <boost/url.hpp> include. Sub-delims now round-trip unchanged. / inside a segment still encodes to %2F (pchars excludes it), so ParsedURL::path vector semantics are preserved.

Parallels src/libutil/include/nix/util/url-parts.hh where pcharRegex already correctly includes sub-delims for parsing; this makes rendering match.

Scope

Path encoding only. encodeQuery is unchanged — & and = are structural separators there, and + in queries has application/x-www-form-urlencoded baggage.

Backwards compat

Input::checkLocks compares final-input attrs with raw string equality. Lock files written by older nix have %2B in url fields; a fresh fetch now renders + → spurious mismatch. The second commit normalizes url through parseURL().to_string() on both sides before comparison, mirroring the existing narHash normalization hack in the same function.

Other affected tests/behavior

  • nix store info --json for file:// stores now shows + instead of %2B in .url
  • NarInfoDiskCache key changes for store URLs with + in the path → one-time cache miss on upgrade (self-healing)
  • flakeref.cc:doesntReencodeUrl was added in 1d7a57c to catch double-encoding but its expected value (%2B) documented the old single-encoding. Updated to assert true round-trip, which now matches its name.

encodeUrlPath was encoding everything outside unreserved + ":@", which
percent-encodes sub-delims like `+` even though RFC 3986 §3.3 allows
them unencoded in path segments (pchar = unreserved / pct-encoded /
sub-delims / ":" / "@").

This meant parseURL("file:///a+b").to_string() produced file:///a%2Bb.
Every consumer that strips the file:// prefix and treats the remainder
as a filesystem path then needed an explicit percentDecode (224c818,
c8e56b3 fixed two such sites). Downstream, nixpkgs had to use `.pN`
instead of `+N` in patched version strings to keep build tmpdir paths
URL-safe.

Use boost::urls::pchars — the RFC-correct charset, already available
via the existing <boost/url.hpp> include. Sub-delims now round-trip
unchanged; forgetting a decode at a future call site becomes harmless
for these characters. `/` inside a segment still encodes to %2F
(pchars excludes it).

The flakeref.cc test `doesntReencodeUrl` was added in 1d7a57c to
check "doesn't percent-encode twice" but its expected value documented
the old single-encoding (`+` → `%2B`). Updated to assert round-trip,
which now actually matches the test name.
@github-actions github-actions Bot added documentation with-tests Issues related to testing. PRs with tests have some priority labels Mar 18, 2026
@lovesegfault lovesegfault force-pushed the fix-functional-file-urls branch from 964b31d to 5c78d94 Compare March 18, 2026 00:51
@xokdvium
Copy link
Copy Markdown
Contributor

One thing to note: this might break flake lockfiles downstream.

…nces

When a final input's attrs are compared against a fresh fetch result,
the comparison at checkLocks is a raw string equality over each field.
The preceding commit changed URL path rendering to stop over-encoding
sub-delims, so lock files written by older nix have e.g. `%2B` where
current nix renders `+` — same URL, different string.

Normalize the `url` field through parseURL().to_string() on both sides
before comparison, mirroring the existing narHash normalization hack
a few lines above. Unparsable values are left as-is so the original
error still fires for genuine mismatches.
@lovesegfault lovesegfault force-pushed the fix-functional-file-urls branch from 5c78d94 to 64947e9 Compare March 18, 2026 01:47
@github-actions github-actions Bot added the fetching Networking with the outside (non-Nix) world, input locking label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation fetching Networking with the outside (non-Nix) world, input locking with-tests Issues related to testing. PRs with tests have some priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants