RAT-558: Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md)#677
RAT-558: Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md)#677potiuk wants to merge 4 commits into
Conversation
Rebased onto current master, which already added AGENTS.md and SECURITY.md. Keeps both maintainer files and adds the detailed THREAT_MODEL.md plus the AGENTS.md -> SECURITY.md -> THREAT_MODEL.md pointers. Generated-by: Claude Opus 4.8 (1M context)
35879b0 to
d4f0fdd
Compare
|
This PR looks like it needs answers from developers before submitting. |
|
Yes. Absolutely. - it's enough we just comment in the PR answering the questions and I will update the PR accordingly |
| or a **Maven plugin** — always **in the developer's or CI's own process**, | ||
| never as a network service. Whisker generates license documentation; Tentacles | ||
| inspects staged release bundles. None is a server. | ||
|
|
There was a problem hiding this comment.
Would it make sense to add a new notion here as RAT can be used to change your own sources to include license headers? In this way user input can be altered or is this not relevant from a security-scope?
| when RAT is deliberately run on your own (trusted) code — the dominant, | ||
| intended case. Findings whose only impact requires running RAT on input you | ||
| already trust are `OUT-OF-MODEL: trusted-input`. | ||
| - **Test resources** (the deliberately-odd license fixtures under |
There was a problem hiding this comment.
Would it make sense to add documentation that Maven/Ant/CLI options is generated, as this would mean that security implications are automatically transferred to all of RAT UIs?
|
|
||
| ## §15 Appendix — existing-policy back-map | ||
|
|
||
| No in-repo `SECURITY.md` exists today; this PR adds one (ASF security-process |
There was a problem hiding this comment.
a very basic security file was introduced via
#671
|
In answer to the first question, there is a PR to ensure we have this covered. |
Incorporates the Creadur PMC's PR apache#677 review: - archive walker confirmed unbounded (in-memory extraction) -> §9 gap + §10 - XML/DOCTYPE hardening noted as in-flight PMC PR (§14 Q3, link pending) - documents RAT write mode (--addLicense) as trusted-input / out-of-model - notes CLI/Ant/Maven front-ends are generated from a common core - §15 corrected: SECURITY.md already exists (added via apache#671) Generated-by: Claude Opus 4.8
|
Thanks
Still open if you have a moment (one line each is plenty): Q1 (confirm the untrusted-input case is the one to model), Q2 (RAT makes no network connections), Q5 (Whisker/Tentacles share the profile), and Q6 (want us to add the same pointer files to creadur-whisker/-tentacles, or will you?). One note on CI: the failing "Build and analyze" (CodeQL) check is unrelated to this PR — it's a docs-only change (three .md files), so it isn't introducing or affected by that build job; looks pre-existing/flaky on the branch. |
|
@potiuk There is one more point that has not been discussed. RAT allows developers to extend the matching algorithms. See https://creadur.apache.org/rat/license_def.html#Matchers The upshot is that 3rd parties can create new matchers and use them in license checks. Matchers are different from license checks in that license checks use matchers. For example the Apache 2.0 license check uses the Matchers scan the contents of the file (as a String) looking for matches. This means that a custom matcher would have access to all text from all files that are selected for scanning. But this is defined in the configuration and is under control of the developer using RAT. |
|
@potiuk the Sonarbuild does only run on specific branches/with specific PRs as the credentials are not shared among all PRs/builds due to ASF restrictions. |
#679 is the PR that does the XXE hardening. |
Per Claudenw (PR apache#677): RAT lets operators define custom matcher classes that see all scanned file text, but the matcher set is operator-defined config (not attacker-supplied), so it's OUT-OF-MODEL: trusted-input — same posture as the write mode. Generated-by: Claude Opus 4.8 (1M context)
|
Thanks
Still one open item: the XXE-hardening PR number (§14 Q3) — I've left §8 #2 tentative pending it. Whenever you drop the number I'll cite it and flip XXE from "hardening in flight" to a provided property. No rush. The remaining §14 questions (Q1 untrusted-input posture, Q2 no-network, Q5 Whisker/Tentacles profile, Q6 sibling pointer files) are still open whenever convenient — one line each is plenty. |
|
@potiuk - thanks again:
|
| sometimes pointed at **untrusted input**: a CI job auditing an untrusted | ||
| contribution/PR, or auditing a downloaded third-party artifact. That is the | ||
| case the model cares about. *(inferred — Q1.)* | ||
|
|
There was a problem hiding this comment.
This statement is correct
| caller invokes RAT (CLI/Ant/Maven) on a directory + a config | ||
| │ trusted invocation | ||
| ▼ | ||
| read configuration (XMLConfigurationReader) ── XXE surface if config is untrusted |
There was a problem hiding this comment.
#679 is the PR that does the XXE hardening. I don't know if that impacts here.
|
|
||
| - A JRE; RAT reads the filesystem it is pointed at and writes a report. It opens | ||
| **no network connections** and runs no services. *(inferred — Q2, the | ||
| no-network claim is high-value to confirm.)* |
There was a problem hiding this comment.
True, no network connections are opened by RAT. RAT only opens files. One potential hole in this is XSLT transforms where the operator could add an xsl:include statement to open a connection to a remote system. This is out of scope as the XSLT are in the trusted space under control of the operator.
| **no network connections** and runs no services. *(inferred — Q2, the | ||
| no-network claim is high-value to confirm.)* | ||
| - The XML parser behaviour depends on the platform JAXP unless RAT configures it | ||
| (§5a/§8). *(inferred — Q3.)* |
There was a problem hiding this comment.
It depends upon JAXP an can be configured through the JAXP environment variables as documented:
https://docs.oracle.com/javase/8/docs/technotes/guides/security/jaxp/jaxp.html#setting-jaxp-properties-as-system-properties
| (§8/§9, maintainer-confirmed). XML-parser DOCTYPE handling is being hardened via | ||
| a PMC PR (§14 Q3). There is no "insecure default toggle". *(maintainer / Q3 | ||
| pending PR link.)* | ||
|
|
| | Input | Attacker-controllable? (untrusted-run) | Concern | | ||
| | --- | --- | --- | | ||
| | scanned file content | **yes** | parsed/read; resource use | | ||
| | scanned file paths / archive entry names | **yes** | path handling on archive extraction | |
There was a problem hiding this comment.
What does "path handling on archive extraction" mean? We do not extract the data into a directory. We read the files from the archive and extract them from there. The file paths are documented as relative to the archive so something like "/bar/baz.zip#/junk.txt" is reported for a file junk.txt in the archive baz.zip found in the /some/dir/bar/ directory on a unix/mac system where RAT was pointed to /some/dir as the tree to scan.
But the contents of junk.txt was only every extracted to memory.
| trusted caller; inputs are normally trusted, but the security-relevant case is | ||
| RAT auditing **untrusted** input (CI on untrusted PRs, third-party artifacts). | ||
| Is that the case you want modelled, or do you consider all RAT input trusted | ||
| (which would move XXE/archive items to `OUT-OF-MODEL: trusted-input`)? (§2/§7.) |
There was a problem hiding this comment.
All rat configuration items (XSLT stylesheets, configuration files, license definitions, matcher implementations) are trusted and under control of the operator.
The files that are read may be untrusted, as you point out, in the case of verification of PRs from 3rd parties.
Attack surface is anything that can break out of the scanning stream when the system is run with the default settings. I am certain that there are settings that could open the system up for attack, for example the JAXP environment vars.s
| - **Q3.** *(Partially answered — PMC, PR #677: a hardening PR is in flight | ||
| ensuring DOCTYPE / external-entity handling is covered. **Pending the PR link | ||
| to cite**; once landed §8 #2 becomes a provided property.)* Does | ||
| `XMLConfigurationReader` disable DOCTYPE / external entities (XXE-safe)? |
There was a problem hiding this comment.
External entities are disabled.
| in-memory buffer (Commons Compress `ArchiveStreamFactory`) held until the | ||
| document is processed, so a crafted archive can OOM. Resolved as a §9 gap + | ||
| §10 responsibility; §8 #1 is **not** a provided property.)* Does | ||
| `ArchiveWalker` bound decompression (size/depth/entry-count)? |
There was a problem hiding this comment.
We do not guard against OOM, we probably should add a limit but at this time we do not. This is out of scope and not provided.
| **Wave 2 — surface.** | ||
|
|
||
| - **Q2.** Confirm RAT makes no network connections and has no side effects beyond | ||
| reading the scanned tree and writing the report. (§5/§8.) |
…d (external entities disabled, apache#679 hardens DOCTYPE), no-network confirmed (XSLT xsl:include caveat), correct archive path-handling (read to memory, no extract-to-disk → no path traversal), Whisker/Tentacles deferred
|
Thanks @ottlinger and @Claudenw — that's everything we needed. I've folded all your answers into
With every §14 question answered, the model is ready to ratify whenever the PMC's happy with it. (The red check is the CodeQL "Build and analyze" job, which is unrelated to these doc-only changes — all 13 build/test matrix jobs pass.) |
What
Adds a threat model for Apache Creadur (RAT) at the Creadur PMC's request (GLASSWING / Mythos scan pre-flight):
THREAT_MODEL.md— the model (rubric).SECURITY.md+AGENTS.md— disclosure pointer + theAGENTS.md -> SECURITY.md -> THREAT_MODEL.mdchain.The model in brief
RAT is modelled as an in-process build/CLI license-audit tool — not a network service, and explicitly not a security/vulnerability scanner. Its security-relevant case is auditing untrusted input: the XML configuration (XXE surface) and archive descent (decompression-bomb surface). Findings that require RAT to process input the operator already trusts (the normal case — your own source tree) are out of model.
DRAFT — you own it; two quick technical confirmations
Because RAT is small, the §8-vs-§9 split hinges on two facts I've left as section 14 questions:
XMLConfigurationReaderdisable DOCTYPE/external entities (XXE-safe)?ArchiveWalkerbound decompression (size/depth/entry-count)?Your answers turn those from "open question" into either a provided property (§8) or a documented gap + downstream note (§9). Also Q6: want me to add the same chain to
creadur-whiskerandcreadur-tentaclesso all three are discoverable?Generated by the ASF Security team's threat-model tooling (Claude Opus); reviewed before opening.