fix(plugins/k8saudit-aks): skip non-JSON records before parsing#1337
fix(plugins/k8saudit-aks): skip non-JSON records before parsing#1337raajheshkannaa wants to merge 3 commits intofalcosecurity:mainfrom
Conversation
When AKS diagnostic settings route multiple categories
(kube-apiserver, kube-controller-manager, kube-scheduler,
cluster-autoscaler, kube-audit) to the same EventHub, only the
kube-audit category emits JSON. The other categories carry klog
text in properties.log, which the consumer was feeding straight
into the JSON parser. This produced thousands of "cannot parse
JSON: cannot parse number" errors per minute under load.
Add a small looksLikeJSON guard that checks for { or [ after
leading whitespace and skip the record when the prefix does not
match. Audit events from the kube-audit feed are always JSON
objects or arrays, so the guard does not drop legitimate events.
Includes a table-driven test covering JSON, klog info/trace
lines, plain text, numbers, string literals, and a UTF-8 BOM
case that documents preserved parser behavior.
Closes falcosecurity#1145
Signed-off-by: Raajhesh Kannaa Chidambaram <495042+raajheshkannaa@users.noreply.github.com>
| // properties.log instead of a JSON audit event. Skip them silently | ||
| // rather than spamming "cannot parse JSON" errors. See issue #1145. | ||
| logBytes := []byte(i.Properties.Log) | ||
| if !looksLikeJSON(logBytes) { |
There was a problem hiding this comment.
I guess using fastjson.ValidateBytes() is a more robust approach. WDYT?
There was a problem hiding this comment.
Good call, switched to fastjson.ValidateBytes in 7b23a83. Promoted fastjson from indirect to a direct dependency, renamed the helper to isValidJSON, and updated the table-driven test (the number/string-literal cases are now valid JSON per spec, replaced with a truncated-object case to keep coverage on malformed inputs). Tests pass locally.
Per ekoops review feedback, replace the looksLikeJSON heuristic with fastjson.ValidateBytes which is already a transitive dependency. This is more robust against edge cases the prefix check might miss (truncated JSON, malformed structure) without changing behavior for the klog text that motivated the guard. Signed-off-by: Raajhesh Kannaa Chidambaram <495042+raajheshkannaa@users.noreply.github.com>
ekoops
left a comment
There was a problem hiding this comment.
To be honest, I would not create a separate function. I would just call it in place, with a comment describing the fact that we are doing pre-validation to ensure that errors related to non-JSON payloads pollute the logs. Moreover, tests are not needed here, because we are basically testing the fastjson.ValidateBytes() API, which should be a job of fastjson maintainers
Drop the isValidJSON helper and call fastjson.ValidateBytes in place with the reasoning in the surrounding comment. Remove the helper test since it covered the upstream fastjson API rather than plugin behavior. Signed-off-by: Raajhesh Kannaa Chidambaram <495042+raajheshkannaa@users.noreply.github.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: raajheshkannaa The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
raajheshkannaa
left a comment
There was a problem hiding this comment.
Done in a5cb0cf. Inlined fastjson.ValidateBytes at the call site with the rationale in the surrounding comment, and dropped the helper test since it covered upstream fastjson behavior rather than this plugin's logic.
What type of PR is this?
/kind bug
Any specific area of the project related to this PR?
/area plugins
What this PR does / why we need it:
When AKS diagnostic settings route multiple categories (
kube-apiserver,kube-controller-manager,kube-scheduler,cluster-autoscaler,kube-audit) to the same EventHub, only thekube-auditcategory emits JSON. The other categories carry klog text inproperties.log(e.g.I0109 21:24:54.475483 ...orTrace[7816759]: ...), which the consumer was feeding straight into the JSON parser. The reporter saw ~9100cannot parse JSON: cannot parse numbererrors in 20 minutes from a single pod.The plugin now checks the first non-whitespace byte of
properties.log. If it isn't{or[, the record is skipped silently. Kubernetes audit events are always JSON objects or arrays per the API server schema, so the guard cannot drop legitimate events.Includes a table-driven test for the helper covering JSON objects, JSON arrays, klog info and trace lines, plain text, numbers, string literals, and a UTF-8 BOM case that documents the preserved parser behavior.
The other
k8saudit-*plugins are not affected: EKS filters CloudWatch to thekube-apiserver-auditlog group before parsing, GKE rejects non-audit Pub/Sub entries, and OVH JSON-decodes its envelope before reaching the audit parser. Keeping the helper local avoids a shared abstraction with no second user.Which issue(s) this PR fixes:
Fixes #1145
Special notes for your reviewer:
go vetreports two pre-existing warnings aboutcancel/context paths inOpen(). Confirmed on upstreammainat85f6d9e; not introduced by this PR.Does this PR introduce a user-facing change?: