fix(libutil): don't over-encode sub-delims in URL paths#15510
Draft
lovesegfault wants to merge 3 commits intomasterfrom
Draft
fix(libutil): don't over-encode sub-delims in URL paths#15510lovesegfault wants to merge 3 commits intomasterfrom
lovesegfault wants to merge 3 commits intomasterfrom
Conversation
encodeUrlPath was encoding everything outside unreserved + ":@", which
percent-encodes sub-delims like `+` even though RFC 3986 §3.3 allows
them unencoded in path segments (pchar = unreserved / pct-encoded /
sub-delims / ":" / "@").
This meant parseURL("file:///a+b").to_string() produced file:///a%2Bb.
Every consumer that strips the file:// prefix and treats the remainder
as a filesystem path then needed an explicit percentDecode (224c818,
c8e56b3 fixed two such sites). Downstream, nixpkgs had to use `.pN`
instead of `+N` in patched version strings to keep build tmpdir paths
URL-safe.
Use boost::urls::pchars — the RFC-correct charset, already available
via the existing <boost/url.hpp> include. Sub-delims now round-trip
unchanged; forgetting a decode at a future call site becomes harmless
for these characters. `/` inside a segment still encodes to %2F
(pchars excludes it).
The flakeref.cc test `doesntReencodeUrl` was added in 1d7a57c to
check "doesn't percent-encode twice" but its expected value documented
the old single-encoding (`+` → `%2B`). Updated to assert round-trip,
which now actually matches the test name.
964b31d to
5c78d94
Compare
Contributor
|
One thing to note: this might break flake lockfiles downstream. |
…nces When a final input's attrs are compared against a fresh fetch result, the comparison at checkLocks is a raw string equality over each field. The preceding commit changed URL path rendering to stop over-encoding sub-delims, so lock files written by older nix have e.g. `%2B` where current nix renders `+` — same URL, different string. Normalize the `url` field through parseURL().to_string() on both sides before comparison, mirroring the existing narHash normalization hack a few lines above. Unparsable values are left as-is so the original error still fires for genuine mismatches.
5c78d94 to
64947e9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
encodeUrlPathencodes everything outsideunreserved + ":@", which percent-encodes sub-delims like+even though RFC 3986 §3.3 allows them unencoded in path segments:So
parseURL("file:///a+b").to_string()producesfile:///a%2Bb. Both are valid URLs, but every consumer that stripsfile://and treats the remainder as a filesystem path then needs an explicitpercentDecode— 224c818 and c8e56b3 each fixed one such site. Downstream, nixpkgs had to use.pNinstead of+Nin patched version strings to keep build tmpdir paths URL-safe.Fix
Use
boost::urls::pchars— exactly the RFC charset, already available via the existing<boost/url.hpp>include. Sub-delims now round-trip unchanged./inside a segment still encodes to%2F(pchars excludes it), soParsedURL::pathvector semantics are preserved.Parallels
src/libutil/include/nix/util/url-parts.hhwherepcharRegexalready correctly includes sub-delims for parsing; this makes rendering match.Scope
Path encoding only.
encodeQueryis unchanged —&and=are structural separators there, and+in queries hasapplication/x-www-form-urlencodedbaggage.Backwards compat
Input::checkLockscompares final-input attrs with raw string equality. Lock files written by older nix have%2Binurlfields; a fresh fetch now renders+→ spurious mismatch. The second commit normalizesurlthroughparseURL().to_string()on both sides before comparison, mirroring the existingnarHashnormalization hack in the same function.Other affected tests/behavior
nix store info --jsonforfile://stores now shows+instead of%2Bin.url+in the path → one-time cache miss on upgrade (self-healing)flakeref.cc:doesntReencodeUrlwas added in 1d7a57c to catch double-encoding but its expected value (%2B) documented the old single-encoding. Updated to assert true round-trip, which now matches its name.