[awsfirehose] Fix VPC flow log detection in firehose routing pipeline#18395
[awsfirehose] Fix VPC flow log detection in firehose routing pipeline#18395
Conversation
573b050 to
9848b2b
Compare
🚀 Benchmarks reportTo see the full report comment with |
The condition for detecting VPC flow logs had two bugs:
1. Operator precedence: missing parentheses around the `accept || reject`
check caused any message containing "reject" to be misrouted to
`aws.vpcflow` regardless of other conditions.
2. Broken string matching: the checks for `"eni-` and `"vpc-` (with
leading double-quote) never matched because inside the JSON wrapper
`{"message":"..."}`, ENI and VPC IDs are preceded by spaces, not
quotes. The old test only passed as a side effect of bug #1.
The new condition checks for `eni-` (without quote prefix) as the primary
marker, combined with secondary markers that cover all VPC flow log
variants:
- `accept`/`reject` for standard flow logs with traffic data
- `nodata`/`skipdata` for records with no captured data
- `tgw-` for transit gateway flow logs (which lack ACCEPT/REJECT)
The test suite is expanded from 1 to 12 cases covering all formats
supported by the downstream vpcflow pipeline (14, 17, 21, 29, 36, 39,
40 fields), using examples from the official AWS documentation.
Closes #18110
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Regional NAT gateway VPC flow logs (v9) can have interface-id = "-" with only nat-xxxxx as a resource identifier. Adding nat- as an alternative primary marker ensures these records are correctly routed to aws.vpcflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds test fixtures for the two missing VPC flow log versions: - v4 (18 tokens): v2 fields + region, az-id, sublocation-type, sublocation-id - v10 (30 tokens): v5 fields + encryption-status All versions v2-v10 now have explicit test coverage (15 cases total). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Transit gateway flow logs can have both tgw-src-eni and tgw-dst-eni set to "-", leaving no eni- in the record. Since tgw- is specific enough on its own (CloudTrail is matched before this check), promote it to a standalone primary path: tgw- || ((eni- || nat-) && (accept || reject || nodata || skipdata)) Adds test case #16: transit gateway with OK status and no ENI fields, detected solely via tgw-. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…flow
Adds a firewall log event whose alert action is "reject" to the
firewall test suite. This is the exact scenario the original bug
caused: the old condition `... && contains('accept') ||
contains('reject')` would misroute any message containing "reject"
to aws.vpcflow due to operator precedence.
Since the VPC flow check runs before the firewall check in the
pipeline, this test will fail if the precedence bug is reintroduced
— the event would be routed to aws.vpcflow instead of the expected
aws.firewall_logs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9848b2b to
25fef9e
Compare
tetianakravchenko
left a comment
There was a problem hiding this comment.
please add changelog record and bump version of the package
|
|
||
| // AWS VPC Flow Logs | ||
| else if (message_lower.contains('"eni-') && message_lower.contains('"vpc-') && message_lower.contains('accept') || message_lower.contains('reject')) { | ||
| // AWS VPC Flow Logs (standard, transit gateway, and regional NAT gateway) |
There was a problem hiding this comment.
zonal NAT gateway is also covered, isn't it?
can be added some clarification comment, like:
| // AWS VPC Flow Logs (standard, transit gateway, and regional NAT gateway) | |
| // AWS VPC Flow Logs: | |
| // eni- → standard ENI-based records AND zonal NAT gateway (interface-id = eni-xxxxx) | |
| // nat- → regional NAT gateway v9+ (interface-id = "-", resource-id = nat-xxxxx) | |
| // tgw- → transit gateway (no action field; log-status may be OK with no ACCEPT/REJECT) |
?
(if it is correct)
There was a problem hiding this comment.
Yeah, it seems so. Thanks for checking, taking a closer look at this use case.
There was a problem hiding this comment.
For the eni- part, we can only support the standard format (which includes accept/reject). We probably need to note this in the docs, if it's not there yet.
And yes, we should update the comment to reflect what we support now.
| // AWS VPC Flow Logs | ||
| else if (message_lower.contains('"eni-') && message_lower.contains('"vpc-') && message_lower.contains('accept') || message_lower.contains('reject')) { | ||
| // AWS VPC Flow Logs (standard, transit gateway, and regional NAT gateway) | ||
| else if (message_lower.contains('tgw-') || ((message_lower.contains('eni-') || message_lower.contains('nat-')) && (message_lower.contains('accept') || message_lower.contains('reject') || message_lower.contains('nodata') || message_lower.contains('skipdata')))) { |
There was a problem hiding this comment.
wondering if tgw- as a standalone condition has a risk of matching non-vpc logs (that follow after this check in ingest pipeline)?
in https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-records-examples.html#flow-log-example-tgw is mentioned that action should be available
There was a problem hiding this comment.
Good point, checking this as well!
There was a problem hiding this comment.
Tightened the transit gateway detection rule adding the resource-type.
There was a problem hiding this comment.
btw, where transit gateway log samples are coming from? I see that test samples 1-8 are from https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-records-examples.html, but the rest? are these real or reconstructed from spec?
There was a problem hiding this comment.
I used examples from the docs as much as possible, but some test cases have been reconstructed from the spec.
Let me see if I can find more examples, including outside the AWS docs. We can also set up a couple of VPCs with a transit gateway to collect some samples.
There was a problem hiding this comment.
lets start with what we have now and iterate over it 👍
Using tgw- as a standalone primary marker risks matching non-VPC logs that happen to reference transit gateway resources (e.g., a firewall log with "tgw-0abc" in a signature field). Tighten the transit gateway path to require both tgw- and transitgateway (the resource-type field value in the default TGW flow log format). This eliminates false positives while still covering all default-format transit gateway flow logs. Known limitation: VPC flow logs using custom formats that exclude both the action and log-status fields (e.g., the 6-field zonal NAT gateway format from the AWS docs) cannot be detected through content matching alone. The default format always includes action and log-status, so this only affects deliberately minimal custom configurations. Users with such formats should use stream-name-based routing instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a "How log routing works" section to the docs explaining how each log type is detected and what format requirements users must follow for auto-detection to work. Includes a note about using the es_datastream_name parameter as a fallback for custom formats. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Vale Linting ResultsSummary: 4 warnings found
|
| File | Line | Rule | Message |
|---|---|---|---|
| packages/awsfirehose/_dev/build/docs/README.md | 32 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'using' instead of 'via'. |
| packages/awsfirehose/_dev/build/docs/README.md | 36 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'using' instead of 'via'. |
| packages/awsfirehose/docs/README.md | 32 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'using' instead of 'via'. |
| packages/awsfirehose/docs/README.md | 36 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'using' instead of 'via'. |
The Vale linter checks documentation changes against the Elastic Docs style guide.
To use Vale locally or report issues, refer to Elastic style guide for Vale.
|
I added a new section in the docs to suggest how to successfully route logs using this integration. |
tetianakravchenko
left a comment
There was a problem hiding this comment.
@zmoog build is failing:
Error: checking package failed: checking readme files are up-to-date failed: files do not match
I believe elastic-package build should fix it
| // AWS WAF Logs | ||
|
|
||
| // Initialize lower case version of the message to |
There was a problem hiding this comment.
this sentence seems to be not finished, or?
| each record to the appropriate integration dataset. The detection relies on known patterns in the | ||
| log message, such as field names, resource identifiers, or structural characteristics. | ||
|
|
||
| The following table describes how each log type is detected and what is required for routing |
There was a problem hiding this comment.
this table can get quickly outdated, not sure if we should ducplicate in readme filtering logic
I think mentioning about the detecting, maybe 1 specific example and leaving a note about auto-detection not working should be good for the doc
There was a problem hiding this comment.
Yeah, after a second look I thought rh same: I'll only leave the recommendations on the formats.
Also rebuilds the docs and completes truncated comments.
💚 Build Succeeded
History
cc @zmoog |
Summary
&&vs||) and incorrect string matching ("eni-with leading quote never matched inside JSON-wrapped messages)tgw-+transitgateway— transit gateway flow logs (default format includesresource-type: TransitGateway)eni-— standard VPC flow logs, requires a secondary marker (accept,reject,nodata,skipdata)nat-— v9 regional NAT gateway flow logs (whereinterface-idis-), requires a secondary marker"action":"reject"that must route toaws.firewall_logs, notaws.vpcflow— this is the exact scenario the original operator precedence bug causedKnown limitation
VPC flow logs using custom formats that exclude both the
actionandlog-statusfields (e.g., the 6-field zonal NAT gateway format) cannot be detected through content matching alone. The default format always includesactionandlog-status, so this only affects deliberately minimal custom configurations. Users with such formats should use thees_datastream_namedelivery stream parameter for routing instead.Test plan
elastic-package test pipeline -v -d logspasses all 22 tests (16 vpcflow + 2 firewall + 8 other log types unchanged)aws.vpcflowpipeline dissect patternsCloses #18110
🤖 Generated with Claude Code