src/news/article_fetcher.py implements defense-in-depth against Server-Side Request Forgery on the article-fetching network path:
- DNS resolution check: Resolves hostname at fetch time, rejects private/reserved IPs
- Redirect validation: Custom
_SafeHTTPRedirectHandlervalidates every redirect hop (max 3 hops) - Blocked targets:
- Private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16)
- Link-local (169.254.0.0/16) and loopback (::1)
- Internal hostnames (localhost, localhost.localdomain)
- Protocol restriction: HTTP(S) only
src/news/link_resolver.py performs lightweight public-URL preflight checks and records blocked redirect targets, but it intentionally does not do DNS resolution because URL normalization is kept network-free. DNS/IP validation happens before real article fetches.
src/news/url_normalizer.py rejects parser-ambiguous or unsafe URL forms before metadata is attached:
- non-HTTP(S) schemes
- credentialed URLs
- localhost names and
.localhost - private, loopback, link-local, multicast, reserved, and unspecified IP literals
- short / integer / octal-looking / hex-looking IPv4 forms
- percent-encoded hostnames
- backslashes, control characters, spaces, and invalid ports
- repeated-encoded redirect targets that decode to unsafe URLs
CI pipeline runs detect-secrets and fails if the scan JSON contains any unallowlisted finding under results. The workflow disables KeywordDetector to reduce noisy generic key-name matches, while retaining detector plugins such as:
- GitHub personal access tokens (classic and fine-grained)
- OpenAI-style API keys
- Cloud, package-registry, chat-service, and payment-provider tokens
- Private key markers
The intentional Basic Auth-shaped URL fixture in tests/test_url_normalizer.py is marked with an inline allowlist comment. Local tracked-file verification on 2026-06-03 produced empty scan results.
.envfiles excluded from git via.gitignoreconfig/runtime_state.yamlexcluded from git (contains runtime paths)memory/excluded from git (contains learner data).env.exampleserves as the single canonical config template with placeholder values only
src/safe_writer.py ensures file write safety:
| Mechanism | Detail |
|---|---|
| Atomic writes | Write to .tmp → replace target with retry |
| Backup | Automatic timestamped backup before overwrite |
| File locking | Retry on PermissionError (up to 8 attempts) |
| Cleanup | try/finally guarantees temp file cleanup |