fix(core): normalize trailing dots and spaces in Win32 URL paths#164
fix(core): normalize trailing dots and spaces in Win32 URL paths#164kali834x wants to merge 1 commit into
Conversation
|
Please see code which already performs some WIN32/Cygwin path normalization in Yes, I fully understand your point that if someone has written URL matching rules, then alternatives with trailing ' Aside: Generally speaking, when writing matching rules it is important for the admin to understand security best practices. The admin ought to write rules which explicitly allow recognized paths, and then deny everything else. Unfortunately, many people do not follow this discipline of explicit allow, default deny. I do see that this PR adds code to run only if I do not see where your patch extends Note: this was a quick look; I have not looked closely at this patch. |
|
Thank you for the review and feedback. I have updated the branch to address the points raised: reworked the Win32 normalization flow to avoid extra URL/query string copies, integrated the dot/space handling into burl_normalize_path(), removed the additional strlen() by returning the new length directly, added support for HTTP_PARSEOPT_URL_NORMALIZE_PATH_DOTSEG_REJECT, and added corresponding unit tests. Please let me know if you have any further suggestions. |
gstrauss
left a comment
There was a problem hiding this comment.
Quick review; not complete. I was slowed down by seemingly random changes (removing a function attribute without explanation) and code inconsistencies (sloppily adding two blank lines for no apparent reason), and code style clashes (in this case, cuddled 'else's do not match any other code in the file).
While the content of the code is more important, the code smells -- whether accidental or not -- indicate that more time needs to be sunk into code reviews, questioning everything and lowering the assessment of code quality.
bb781f1 to
d548ebb
Compare
|
Pushed a reworked version, squashed to a single commit. Main changes from the review: dropped the redundant %2e/%2E handling (already decoded by burl_normalize_basic_*() before this runs), reverted the unrelated edits (function attribute, stray blank lines), matched the file's coding style, and moved the win32 detection onto a path_simplify bit so the no-op case stays on the existing fast path and the rewrite is a single in-place pass. Builds clean and the burl unit tests pass, including the win32 cases. |
|
If Can Inside |
| if (end > beg && (s[end-1] == '.' || s[end-1] == ' ')) | ||
| return 1; |
There was a problem hiding this comment.
If s[end-1] == '.', then check if "." or ".." (and skip) instead of always checking for "." or ".." in burl_normalize_win32_needs().
There was a problem hiding this comment.
Done. burl_normalize_win32_needs() now reads the trailing run first and only does the "."/".." check when that run is a single trailing '.', since that's the only shape a real dot-segment can take. Same ordering in the rewrite pass.
|
Your patch adds code to remove/reject trailing Should that be a separate option? It is not a dot or dot-dot path segment. I have not made up my mind yet, but I am leaning towards a separate option (for _WIN32 and Cygwin, and a no-op on other platforms) because URLs are not solely mapped to the filesystem, and these characters may legitimately be present at the ends of path segments in URLs which are not intended to be mapped to the filesystem. |
Windows and Cygwin strip trailing '.' and ' ' from each path component when resolving a path on the filesystem, so "/admin./", "/admin%20/", and "/admin/" all map to the same directory. http_response_prepare_physical() already strips these, but only from the final component of the physical path. Mirror that stripping for every URL path component in burl_normalize_path() so that URL-based access controls are not bypassed by these alternate spellings. The scan walks the path once and only flags the path for action when a component (other than "." or "..") has a trailing '.' or "%20"; this sets a bit in path_simplify so the common case stays on the existing fast path. When action is needed, the path is rewritten in a single in-place pass before buffer_path_simplify(). "." and ".." are left intact for dot-segment removal, and a component that strips down to nothing becomes "." so buffer_path_simplify() collapses it. "%2e"/"%2E" and a literal space need no handling here since burl_normalize_basic_*() has already decoded "%2e" to '.' and encoded space to "%20". HTTP_PARSEOPT_URL_NORMALIZE_PATH_DOTSEG_REJECT now also rejects such paths.
d548ebb to
6955f59
Compare
|
Pushed an update. burl_normalize_win32_dots_spaces() now takes the buffer and does its own buffer_truncate(), and the rewrite copies char by char and backs up over the trailing run instead of memmove'ing the segment. On moving it after buffer_path_simplify(): I looked at that. The strip is greedy over both '.' and "%20", so a component can never reduce down to a bare "." or ".." that would still need collapsing. It either keeps a non-dot/space tail or empties out entirely, so the order doesn't change the result. I left it before simplify because an emptied component can then just become "." and the existing simplify pass folds it away, which avoids the "remove the component and a '/'" handling (and the "/.../" -> "/" style edge cases) that running after would need. Happy to flip it if you'd rather drop the dot handling instead. On a separate option: agreed, that's cleaner. Trailing '.'/space only matter once the path maps to the filesystem and can be legitimate on URLs that never do, so folding it into dotseg-remove/reject overreaches. I can put it behind its own _WIN32/Cygwin parseopt that's a no-op elsewhere. The open call is the default: on by default keeps the bypass closed out of the box, opt-in avoids touching non-filesystem URLs. Which would you prefer and I'll wire it up. |
'..' needs to be reserved so that
That reads as somewhat "action at a distance" and suggests that a comment should have explained that dependency. In any case, I think you should reconsider. buffer_path_simplify() removes '.' and '..' segments. If the _WIN32 additional normalization steps for trailing '.' and ' ' occurred after, then the _WIN32 additional normalization steps would then not have to check for and special-case '..', and the code would be better isolated and slightly simpler. |
|
I won't have time to return to these PRs for at least a week. However, I'll checkpoint with a note that this PR is incomplete. In
Just some checkpoint notes of incomplete thoughts. |
|
You're right, I had that backwards. Stripping can reveal a ".." that still needs collapsing: "/a/..%20/b" is "/a/.. /b" on Windows, the trailing space drops, and it resolves to "/b", not "/a/b". So the current code (and that test) is wrong, it folds "..%20" down to "." instead of reserving the "..". That's actually why I think the strip needs to stay before buffer_path_simplify() rather than after. simplify only acts on literal "." and "..", so it has to see the revealed ".." to drop it and the prior segment. If the strip runs after simplify, "..%20" is still "..%20" when simplify passes, and the ".." we reveal afterwards gets left in the path unless we simplify a second time. Running before lets the single existing simplify pass handle it. Downside is the ordering dependency you flagged, which I'll spell out in a comment so it isn't action-at-a-distance. If you'd rather have it after simplify, the correct way is a second simplify pass after the strip. Happy to go that way instead, just say which you prefer and I'll fix the reveal logic + tests to match. On the separate option: agreed, I'll gate it behind its own _WIN32/Cygwin parseopt that's a no-op elsewhere. The only open call is the default (on by default keeps the bypass closed out of the box vs opt-in to avoid touching non-filesystem URLs). I'll wire it up once you pick. For %5C -> %2F under BACKSLASH_TRANS and the other buffer_path_simplify() callers, those read as wider than this PR, and %5C can corrupt URLs that never map to the filesystem like you said. I'd keep this one to the URL-path strip and do those separately unless you'd rather fold them in. No rush, I know you're tied up for a bit. |
Good point. It might make sense to do this before I think these Windows inanities should be a separate flag, but should be done along with |
|
You keep mentioning the vague phrase "the bypass". While I understand to what you are referring, the average reader will not. Also, I think you have misunderstand the potential scope. In the lighttpd configuration, mod_rewrite ( In the early phases of request processing, lighttpd can no a priori know whether or not the URL is intended as a virtual path or is to be mapped into a filesystem path. YOU might assume everything goes to the filesystem and therefore be concerned about a malicious URL bypassing YOUR mod_rewrite or mod_redirect rules. This is not a bypass in lighttpd. lighttpd is a web server and handles the virtual URL path. lighttpd mod_staticfile and a few other lighttpd modules might ultimately then map the URL path to the filesystem. Other lighttpd modules might not. Windows behavior remains obnoxious and necessitates multiple passes to normalize the URL and filesystem paths at multiple places during request processing. |
|
Aside: there may be places in lighttpd which have to take precautions to preserve leading |
|
Windows filesystems (at least some) support trailing '.' or ' '. It appears that Windows command line tools are the ones which remove trailing '.' and ' '. => Have you tested and verified on Windows that lighttpd is returning the same file if a URL contains extra trailing '.' or ' '. Is lighttpd mapping it to the same file? |
|
On the same-file question: the Win32 fs layer in src/fs_win32.c opens and stats through _wopen/_wstati64/CreateFileW on plain paths with no "\?" prefix, so those calls go through Win32 path canonicalization, which strips trailing "." and " " from the final component. That is why "/admin./index.html" and "/admin/index.html" land on the same file, and intermediate components get the same treatment as the path is walked. What I have actually run is the burl unit tests, which exercise the normalization logic; I have not done a live file round trip on a Windows box, so I would frame the mapping as expected from the API behavior and confirmed in the unit logic rather than observed end to end. I can set up a MinGW run that creates "admin" and requests "/admin./" to confirm the round trip if you want it nailed down. Fair point on "the bypass", that wording was sloppy. To restate it in the right terms: lighttpd serves the virtual, percent-decoded r->url.path, and that is legitimate since not everything maps to the filesystem. The narrow case is only when an admin writes a $HTTP["url"] match and a later module (mod_staticfile and friends) maps that same path onto a Windows filesystem, where the trailing "." / " " spellings collapse. %2F and %5C can absolutely be intentional in a virtual path and only become equivalent once mapped to a Win32 path, so this normalization only makes sense as filesystem-facing, not as a blanket URL rewrite. That is the case for the separate flag, agreed. I will move it off DOTSEG_REMOVE/REJECT onto its own _WIN32/Cygwin parseopt that is a no-op elsewhere and runs alongside DOTSEG_REMOVE, kept before buffer_path_simplify() so a revealed ".." (e.g. "/a/..%20/b") is still collapsed by simplify together with the prior segment, with a comment spelling out that ordering dependency. %5C->%2F under BACKSLASH_TRANS, the other buffer_path_simplify() callers, and preserving leading "//" / "\" all read as wider than this patch and can corrupt virtual URLs that never touch the filesystem, so I would keep those as separate follow-ups unless you would rather fold them in. No rush on my end. |
On Windows/Cygwin, NTFS path resolution normalizes trailing dots and spaces in path components. Intermediate components such as
admin.are resolved asadmin, and final components with trailing spaces or dots (e.g.admin) are similarly normalized. In addition, components consisting only of dots and spaces (e.g...or...) may resolve to the current directory.This difference between URL routing logic and filesystem resolution can allow attackers to craft URLs such as
/admin./index.htmlthat bypass URL-based access controls (e.g.$HTTP["url"] =~ "^/admin/") while still resolving to valid filesystem paths on Windows.This PR introduces Win32/Cygwin-specific URL path component normalization inside
burl_normalize()to align URL parsing with platform path resolution behavior:Strips trailing dots and spaces (including
%20) from components containing other characters.Normalizes dot/space-only components (e.g.
..,...,%2e%2e%20) to.(current directory equivalent).Preserves standard traversal components (
.and.., including%2eand%2e%2e) without modification...,...).