What happened?
There appears to be a bug in:
sdks/python/apache_beam/transforms/enrichment_handlers/bigquery.py
In batch mode, requests are collected into a map keyed by the enrichment key:
requests_map[self.create_row_key(req)] = req
If more than one request in the same batch has the same enrichment key, the newer request overwrites the earlier one in requests_map.
As a result, duplicate-key requests in the same batch are not all preserved and matched correctly.
What did you expect to happen?
When multiple requests in the same batch share the same enrichment key, all of them should be retained and handled correctly.
What is the actual behavior?
Only the latest request for a given key is stored in requests_map, so earlier requests with the same key are lost.
Example
Given batched requests like:
Row(id=1, ...)
Row(id=1, ...)
both requests produce the same enrichment key. The second request overwrites the first in requests_map, which can cause one of the requests to be dropped or treated as unmatched incorrectly.
Proposed fix
Store a collection of requests per key instead of a single request, for example a list or queue, and then consume one request per matching response key during result processing.
Additional context
This seems to affect the batch path in BigQueryEnrichmentHandler.__call__, not the single-request path.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
What happened?
There appears to be a bug in:
sdks/python/apache_beam/transforms/enrichment_handlers/bigquery.pyIn batch mode, requests are collected into a map keyed by the enrichment key:
If more than one request in the same batch has the same enrichment key, the newer request overwrites the earlier one in
requests_map.As a result, duplicate-key requests in the same batch are not all preserved and matched correctly.
What did you expect to happen?
When multiple requests in the same batch share the same enrichment key, all of them should be retained and handled correctly.
What is the actual behavior?
Only the latest request for a given key is stored in
requests_map, so earlier requests with the same key are lost.Example
Given batched requests like:
Row(id=1, ...)Row(id=1, ...)both requests produce the same enrichment key. The second request overwrites the first in
requests_map, which can cause one of the requests to be dropped or treated as unmatched incorrectly.Proposed fix
Store a collection of requests per key instead of a single request, for example a list or queue, and then consume one request per matching response key during result processing.
Additional context
This seems to affect the batch path in
BigQueryEnrichmentHandler.__call__, not the single-request path.Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components