Skip to content

fix(core): normalize trailing dots and spaces in Win32 URL paths#164

Open
kali834x wants to merge 1 commit into
lighttpd:masterfrom
kali834x:win32-path-normalization-security-fix
Open

fix(core): normalize trailing dots and spaces in Win32 URL paths#164
kali834x wants to merge 1 commit into
lighttpd:masterfrom
kali834x:win32-path-normalization-security-fix

Conversation

@kali834x

Copy link
Copy Markdown
Contributor

On Windows/Cygwin, NTFS path resolution normalizes trailing dots and spaces in path components. Intermediate components such as admin. are resolved as admin, and final components with trailing spaces or dots (e.g. admin ) are similarly normalized. In addition, components consisting only of dots and spaces (e.g. .. or ...) may resolve to the current directory.

This difference between URL routing logic and filesystem resolution can allow attackers to craft URLs such as /admin./index.html that bypass URL-based access controls (e.g. $HTTP["url"] =~ "^/admin/") while still resolving to valid filesystem paths on Windows.

This PR introduces Win32/Cygwin-specific URL path component normalization inside burl_normalize() to align URL parsing with platform path resolution behavior:

Strips trailing dots and spaces (including %20) from components containing other characters.
Normalizes dot/space-only components (e.g. .. , ..., %2e%2e%20) to . (current directory equivalent).
Preserves standard traversal components (. and .., including %2e and %2e%2e) without modification.

  • Added extensive Windows-specific unit tests covering trailing dot/space cases, percent-encoded variants, and dot/space-only segments (e.g. .. , ...).
  • Tests executed successfully on MinGW (exit code: 0).

@gstrauss

Copy link
Copy Markdown
Member

Please see code which already performs some WIN32/Cygwin path normalization in src/response.c:http_response_prepare_physical(). There, path normalization is performed on the physical path to the filesystem, and not on the URL since not all URLs map to filesystems. Note the comment about the limitation that src/response.c:http_response_prepare_physical() might also affect path-info component of the URL.

Yes, I fully understand your point that if someone has written URL matching rules, then alternatives with trailing '.' and ' ' might bypass the rules. That is, of course, among the reasons burl.c was written, and why I'll consider this patch.

Aside: Generally speaking, when writing matching rules it is important for the admin to understand security best practices. The admin ought to write rules which explicitly allow recognized paths, and then deny everything else. Unfortunately, many people do not follow this discipline of explicit allow, default deny.

I do see that this PR adds code to run only if HTTP_PARSEOPT_URL_NORMALIZE_PATH_DOTSEG_REMOVE flag, but that flag is enabled by default. On such a hot code path, please consider following the code patterns in burl.c where a routine walks the URL and does nothing if no action is needed. If action is needed, a subroutine is called with the current position, and the subroutine copies characters, again continuing to walk the URL once. Complexity O(n). Your patch, on the other hand, copies the query string twice, and the query string can be very long. After your patch does burl_normalize_win32_dots_spaces(), instead of keeping track of the length, your patch performs an additional strlen() which would not be necessary if you had walked the path once, beginning to end, and kept track of the position.

I do not see where your patch extends HTTP_PARSEOPT_URL_NORMALIZE_PATH_DOTSEG_REJECT (reject instead of remove)

Note: this was a quick look; I have not looked closely at this patch.

@kali834x

Copy link
Copy Markdown
Contributor Author

Thank you for the review and feedback. I have updated the branch to address the points raised: reworked the Win32 normalization flow to avoid extra URL/query string copies, integrated the dot/space handling into burl_normalize_path(), removed the additional strlen() by returning the new length directly, added support for HTTP_PARSEOPT_URL_NORMALIZE_PATH_DOTSEG_REJECT, and added corresponding unit tests. Please let me know if you have any further suggestions.

@gstrauss gstrauss left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick review; not complete. I was slowed down by seemingly random changes (removing a function attribute without explanation) and code inconsistencies (sloppily adding two blank lines for no apparent reason), and code style clashes (in this case, cuddled 'else's do not match any other code in the file).

While the content of the code is more important, the code smells -- whether accidental or not -- indicate that more time needs to be sunk into code reviews, questioning everything and lowering the assessment of code quality.

Comment thread src/burl.c
Comment thread src/burl.c Outdated
Comment thread src/burl.c Outdated
Comment thread src/burl.c Outdated
Comment thread src/burl.c
@kali834x kali834x force-pushed the win32-path-normalization-security-fix branch from bb781f1 to d548ebb Compare June 21, 2026 17:59
@kali834x

Copy link
Copy Markdown
Contributor Author

Pushed a reworked version, squashed to a single commit.

Main changes from the review: dropped the redundant %2e/%2E handling (already decoded by burl_normalize_basic_*() before this runs), reverted the unrelated edits (function attribute, stray blank lines), matched the file's coding style, and moved the win32 detection onto a path_simplify bit so the no-op case stays on the existing fast path and the rewrite is a single in-place pass. Builds clean and the burl unit tests pass, including the win32 cases.

@gstrauss

gstrauss commented Jun 24, 2026

Copy link
Copy Markdown
Member

If burl_normalize_win32_dots_spaces() is moved after buffer_path_simplify(), would you need to check for "." and ".." in burl_normalize_win32_dots_spaces()? Add a comment that buffer_path_simplify() has already removed any "." or ".." path segments.

Can burl_normalize_win32_dots_spaces() simply take (buffer *) and call buffer_truncate() before returning?

Inside burl_normalize_win32_dots_spaces(), instead of memmove() of the string segment, would it make sense on this "modification" code path to copy char-by-char and back up a few chars when needed? Also, if a segment is ever all dots, you'll want to remove the segment, including the '/' to avoid double-slashes in the normalized result. This is necessary if burl_normalize_win32_dots_spaces() is moved after buffer_path_simplify().

Comment thread src/burl.c Outdated
Comment on lines +305 to +306
if (end > beg && (s[end-1] == '.' || s[end-1] == ' '))
return 1;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If s[end-1] == '.', then check if "." or ".." (and skip) instead of always checking for "." or ".." in burl_normalize_win32_needs().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. burl_normalize_win32_needs() now reads the trailing run first and only does the "."/".." check when that run is a single trailing '.', since that's the only shape a real dot-segment can take. Same ordering in the rewrite pass.

Comment thread src/burl.c Outdated
@gstrauss

Copy link
Copy Markdown
Member

Your patch adds code to remove/reject trailing . and %20 on _WIN32 and Cygwin under HTTP_PARSEOPT_URL_NORMALIZE_PATH_DOTSEG_REMOVE/HTTP_PARSEOPT_URL_NORMALIZE_PATH_DOTSEG_REJECT.

Should that be a separate option? It is not a dot or dot-dot path segment.

I have not made up my mind yet, but I am leaning towards a separate option (for _WIN32 and Cygwin, and a no-op on other platforms) because URLs are not solely mapped to the filesystem, and these characters may legitimately be present at the ends of path segments in URLs which are not intended to be mapped to the filesystem.

Windows and Cygwin strip trailing '.' and ' ' from each path component
when resolving a path on the filesystem, so "/admin./", "/admin%20/", and
"/admin/" all map to the same directory. http_response_prepare_physical()
already strips these, but only from the final component of the physical
path. Mirror that stripping for every URL path component in
burl_normalize_path() so that URL-based access controls are not bypassed
by these alternate spellings.

The scan walks the path once and only flags the path for action when a
component (other than "." or "..") has a trailing '.' or "%20"; this sets
a bit in path_simplify so the common case stays on the existing fast path.
When action is needed, the path is rewritten in a single in-place pass
before buffer_path_simplify(). "." and ".." are left intact for dot-segment
removal, and a component that strips down to nothing becomes "." so
buffer_path_simplify() collapses it. "%2e"/"%2E" and a literal space need
no handling here since burl_normalize_basic_*() has already decoded "%2e"
to '.' and encoded space to "%20".
HTTP_PARSEOPT_URL_NORMALIZE_PATH_DOTSEG_REJECT now also rejects such paths.
@kali834x kali834x force-pushed the win32-path-normalization-security-fix branch from d548ebb to 6955f59 Compare June 29, 2026 06:11
@kali834x

Copy link
Copy Markdown
Contributor Author

Pushed an update.

burl_normalize_win32_dots_spaces() now takes the buffer and does its own buffer_truncate(), and the rewrite copies char by char and backs up over the trailing run instead of memmove'ing the segment.

On moving it after buffer_path_simplify(): I looked at that. The strip is greedy over both '.' and "%20", so a component can never reduce down to a bare "." or ".." that would still need collapsing. It either keeps a non-dot/space tail or empties out entirely, so the order doesn't change the result. I left it before simplify because an emptied component can then just become "." and the existing simplify pass folds it away, which avoids the "remove the component and a '/'" handling (and the "/.../" -> "/" style edge cases) that running after would need. Happy to flip it if you'd rather drop the dot handling instead.

On a separate option: agreed, that's cleaner. Trailing '.'/space only matter once the path maps to the filesystem and can be legitimate on URLs that never do, so folding it into dotseg-remove/reject overreaches. I can put it behind its own _WIN32/Cygwin parseopt that's a no-op elsewhere. The open call is the default: on by default keeps the bypass closed out of the box, opt-in avoids touching non-filesystem URLs. Which would you prefer and I'll wire it up.

@gstrauss

Copy link
Copy Markdown
Member

On moving it after buffer_path_simplify(): I looked at that. The strip is greedy over both '.' and "%20", so a component can never reduce down to a bare "." or ".." that would still need collapsing. It either keeps a non-dot/space tail or empties out entirely, so the order doesn't change the result.

'..' needs to be reserved so that buffer_path_simplify() running afterwards would remove the '..' and the prior path segment.

I left it before simplify because an emptied component can then just become "." and the existing simplify pass folds it away, which avoids the "remove the component and a '/'" handling (and the "/.../" -> "/" style edge cases) that running after would need.

That reads as somewhat "action at a distance" and suggests that a comment should have explained that dependency.

In any case, I think you should reconsider. buffer_path_simplify() removes '.' and '..' segments. If the _WIN32 additional normalization steps for trailing '.' and ' ' occurred after, then the _WIN32 additional normalization steps would then not have to check for and special-case '..', and the code would be better isolated and slightly simpler.

@gstrauss

Copy link
Copy Markdown
Member

I won't have time to return to these PRs for at least a week.

However, I'll checkpoint with a note that this PR is incomplete.

In burl_normalize(), HTTP_PARSEOPT_URL_NORMALIZE_PATH_BACKSLASH_TRANS does not currently translate %5C to %2F and perhaps it should. That flag is not enabled by default. URLs are valid with these encodings and might not be mapped to the filesystem. If mapped to the filesystem, then full normalization is recommended. If not mapped to the filesystem, then normalizing the URLs for the filesystem would CORRUPT the URL.

buffer_path_simplify() might be an additional location to check for trailing '.' and ' ' on _WIN32, as buffer_path_simplify() is also called on paths after buffer_urldecode_path(). Possibly better would be to review locations where buffer_path_simplify() is called, and if those are filesystem paths (instead of URLs), add additional normalization for _WIN32. However, sometimes a url path is later appended to physical path, and would need the additional _WIN32 filesystem normalization applied then.

Just some checkpoint notes of incomplete thoughts.

@kali834x

Copy link
Copy Markdown
Contributor Author

You're right, I had that backwards. Stripping can reveal a ".." that still needs collapsing: "/a/..%20/b" is "/a/.. /b" on Windows, the trailing space drops, and it resolves to "/b", not "/a/b". So the current code (and that test) is wrong, it folds "..%20" down to "." instead of reserving the "..".

That's actually why I think the strip needs to stay before buffer_path_simplify() rather than after. simplify only acts on literal "." and "..", so it has to see the revealed ".." to drop it and the prior segment. If the strip runs after simplify, "..%20" is still "..%20" when simplify passes, and the ".." we reveal afterwards gets left in the path unless we simplify a second time. Running before lets the single existing simplify pass handle it. Downside is the ordering dependency you flagged, which I'll spell out in a comment so it isn't action-at-a-distance.

If you'd rather have it after simplify, the correct way is a second simplify pass after the strip. Happy to go that way instead, just say which you prefer and I'll fix the reveal logic + tests to match.

On the separate option: agreed, I'll gate it behind its own _WIN32/Cygwin parseopt that's a no-op elsewhere. The only open call is the default (on by default keeps the bypass closed out of the box vs opt-in to avoid touching non-filesystem URLs). I'll wire it up once you pick.

For %5C -> %2F under BACKSLASH_TRANS and the other buffer_path_simplify() callers, those read as wider than this PR, and %5C can corrupt URLs that never map to the filesystem like you said. I'd keep this one to the URL-path strip and do those separately unless you'd rather fold them in. No rush, I know you're tied up for a bit.

@gstrauss

Copy link
Copy Markdown
Member

Stripping can reveal a ".." that still needs collapsing: "/a/..%20/b" is "/a/.. /b" on Windows, the trailing space drops, and it resolves to "/b", not "/a/b".

Good point. It might make sense to do this before buffer_path_simplify(), or to perform the equivalent of buffer_path_simplify() while normalizing Windows inanities.

I think these Windows inanities should be a separate flag, but should be done along with HTTP_PARSEOPT_URL_NORMALIZE_PATH_DOTSEG_REMOVE.

@gstrauss

Copy link
Copy Markdown
Member

You keep mentioning the vague phrase "the bypass". While I understand to what you are referring, the average reader will not. Also, I think you have misunderstand the potential scope.

In the lighttpd configuration, mod_rewrite (url.rewrite et al) and mod_redirect (url.redirect) operate on the percent-encoded URL. On the other hand, the lighttpd.conf conditional blocks such as $HTTP["url"] =~ "..." { ... } operate on internal r->url.path which has been percent-decoded. r->url.path may be used by many modules as a virtual path (and might not be present in the filesytem), and also may be used to construct a filesystem path if the module handling the request uses the filesystem.

In the early phases of request processing, lighttpd can no a priori know whether or not the URL is intended as a virtual path or is to be mapped into a filesystem path. YOU might assume everything goes to the filesystem and therefore be concerned about a malicious URL bypassing YOUR mod_rewrite or mod_redirect rules. This is not a bypass in lighttpd. lighttpd is a web server and handles the virtual URL path. lighttpd mod_staticfile and a few other lighttpd modules might ultimately then map the URL path to the filesystem. Other lighttpd modules might not.

Windows behavior remains obnoxious and necessitates multiple passes to normalize the URL and filesystem paths at multiple places during request processing. %2F and %5C might be intentionally encoded in the virtual URL path. However, if mapped to the filesystem, these now both become relevant and equivalent in _WIN32 paths.

@gstrauss

Copy link
Copy Markdown
Member

Aside: there may be places in lighttpd which have to take precautions to preserve leading // or \\ in Windows paths since normalization might reduce them to a single /.

@gstrauss

Copy link
Copy Markdown
Member

Windows filesystems (at least some) support trailing '.' or ' '. It appears that Windows command line tools are the ones which remove trailing '.' and ' '.

=> Have you tested and verified on Windows that lighttpd is returning the same file if a URL contains extra trailing '.' or ' '. Is lighttpd mapping it to the same file?

@kali834x

kali834x commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

On the same-file question: the Win32 fs layer in src/fs_win32.c opens and stats through _wopen/_wstati64/CreateFileW on plain paths with no "\?" prefix, so those calls go through Win32 path canonicalization, which strips trailing "." and " " from the final component. That is why "/admin./index.html" and "/admin/index.html" land on the same file, and intermediate components get the same treatment as the path is walked. What I have actually run is the burl unit tests, which exercise the normalization logic; I have not done a live file round trip on a Windows box, so I would frame the mapping as expected from the API behavior and confirmed in the unit logic rather than observed end to end. I can set up a MinGW run that creates "admin" and requests "/admin./" to confirm the round trip if you want it nailed down.

Fair point on "the bypass", that wording was sloppy. To restate it in the right terms: lighttpd serves the virtual, percent-decoded r->url.path, and that is legitimate since not everything maps to the filesystem. The narrow case is only when an admin writes a $HTTP["url"] match and a later module (mod_staticfile and friends) maps that same path onto a Windows filesystem, where the trailing "." / " " spellings collapse. %2F and %5C can absolutely be intentional in a virtual path and only become equivalent once mapped to a Win32 path, so this normalization only makes sense as filesystem-facing, not as a blanket URL rewrite.

That is the case for the separate flag, agreed. I will move it off DOTSEG_REMOVE/REJECT onto its own _WIN32/Cygwin parseopt that is a no-op elsewhere and runs alongside DOTSEG_REMOVE, kept before buffer_path_simplify() so a revealed ".." (e.g. "/a/..%20/b") is still collapsed by simplify together with the prior segment, with a comment spelling out that ordering dependency.

%5C->%2F under BACKSLASH_TRANS, the other buffer_path_simplify() callers, and preserving leading "//" / "\" all read as wider than this patch and can corrupt virtual URLs that never touch the filesystem, so I would keep those as separate follow-ups unless you would rather fold them in. No rush on my end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants