feat(workstream-c): add cheatsheet categorizer and grouping by shreeshtripurwarcomp23-coder · Pull Request #934 · OWASP/OpenCRE

shreeshtripurwarcomp23-coder · 2026-06-15T04:45:06Z

Implement categorize_cheatsheet() with 29-label controlled taxonomy
Implement group_cheatsheets() with stable sha256-based group IDs
Deterministic keyword/rule baseline, no LLM dependency
LLM-optional path with safe fallback on failure
Validate all CheatsheetRecord fields in post_init
50 tests covering all acceptance criteria from RFC Issue C

CheatsheetRecord uses local stub pending Workstream B merge.

- Implement categorize_cheatsheet() with 29-label controlled taxonomy - Implement group_cheatsheets() with stable sha256-based group IDs - Deterministic keyword/rule baseline, no LLM dependency - LLM-optional path with safe fallback on failure - Validate all CheatsheetRecord fields in __post_init__ - 50 tests covering all acceptance criteria from RFC Issue C CheatsheetRecord uses local stub pending Workstream B merge.

coderabbitai · 2026-06-15T04:45:18Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 3362e735-cf57-4a82-8a68-efed34dd28da

📥 Commits

Reviewing files that changed from the base of the PR and between f91b995 and f8472d4.

📒 Files selected for processing (2)

application/tests/test_cheatsheet_categorizer.py
application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py

🚧 Files skipped from review as they are similar to previous changes (2)

application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py
application/tests/test_cheatsheet_categorizer.py

Summary by CodeRabbit

Release Notes

New Features
- Added an automated cheatsheet categorization system that assigns categories using keyword rules, with optional AI-powered labeling.
- Added grouping for categorized cheatsheets, producing deterministic groupings for consistent results.
Tests
- Added a comprehensive test suite covering taxonomy rules, deterministic categorization, uncategorized fallback behavior, optional AI labeling flows (including safe fallbacks), and stable group ID generation.

Walkthrough

Adds a new cheatsheet_categorizer.py module defining a controlled TAXONOMY list, CheatsheetRecord and CheatsheetGroup dataclasses, deterministic keyword-based categorization, optional LLM-based categorization with fallback, and a grouping function. A comprehensive unittest module covering taxonomy integrity, all categorization paths, grouping semantics, and internal helpers is added alongside.

Changes

Cheatsheet Categorization System

Layer / File(s)	Summary
Taxonomy constants and data contracts `application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`, `application/tests/test_cheatsheet_categorizer.py`	`TAXONOMY`, `UNCATEGORIZED`, `CheatsheetRecord` with `__post_init__` field validation, and `CheatsheetGroup` with `sha256`-based `make_group_id` are defined; tests assert taxonomy integrity (uniqueness, lowercase, minimum size) and `make_group_id` determinism/formatting (order-independence, 12-char lowercase hex).
Deterministic + LLM categorization and helpers `application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`, `application/tests/test_cheatsheet_categorizer.py`	`_build_searchable_text`, `_deterministic_categorize` (sorted/deduped keyword matcher with `UNCATEGORIZED` fallback), `_validate_labels` (taxonomy filter + dedup), and `categorize_cheatsheet` (LLM-first with exception/invalid/empty fallback to deterministic) are implemented; tests cover all code paths including `use_llm=False` guard and spot-check deterministic mappings.
Grouping logic and tests `application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`, `application/tests/test_cheatsheet_categorizer.py`	`group_cheatsheets` buckets records by label-set `group_id` and returns groups sorted by `group_id`; tests verify co-grouping by identical label sets, separation for different labels, uncategorized placement, full record coverage, sorted output, and empty-input behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 43.28% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: adding cheatsheet categorizer and grouping functionality with clear feature attribution to workstream-c.
Description check	✅ Passed	The description provides detailed context about the implementation, including key features like the 29-label taxonomy, LLM-optional path, validation mechanisms, and test coverage.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@application/tests/test_cheatsheet_categorizer.py`:
- Around line 42-48: The _make_record helper function creates CheatsheetRecord
instances with an empty summary string, but CheatsheetRecord.__post_init__
enforces that summary must be non-empty, causing construction to fail before
test assertions run. Replace the empty string assignment `summary=""` in the
_make_record function with a minimal non-empty placeholder value (such as a
single space, period, or descriptive placeholder text like "Test summary") to
satisfy the validation requirement.

In
`@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`:
- Around line 194-210: The __post_init__ method in CheatsheetRecord currently
validates only string-typed fields but does not validate the element types of
list-typed fields (headings and category_hints). This causes runtime crashes
when non-string items reach code expecting to call " ".join() on these fields.
Add validation in __post_init__ to ensure that headings and category_hints are
lists containing only string elements, raising a ValueError with a clear message
if any element is not a string, so that parser input validation fails fast at
construction time rather than later during string joining operations.
- Around line 366-381: The `_validate_labels` function currently allows
`uncategorized` to be returned alongside other valid labels, which violates the
sentinel semantics where `uncategorized` should only be returned when it is the
sole valid label. Modify the function to add logic after building the deduped
list: if `uncategorized` (or the appropriate constant reference from TAXONOMY)
is present in the result AND there are other valid labels alongside it, remove
the `uncategorized` entry from the returned list. This ensures that
`uncategorized` is only returned when no other categories match, preserving its
role as a fallback indicator and preventing inconsistent downstream grouping and
UX.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: ca9bd7d2-9738-4eb1-9bf6-5c1b32faf0db

📥 Commits

Reviewing files that changed from the base of the PR and between e853cd3 and f91b995.

📒 Files selected for processing (2)

application/tests/test_cheatsheet_categorizer.py
application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py

…bel sentinel fix

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread application/tests/test_cheatsheet_categorizer.py

Comment thread application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py

Comment thread application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py Outdated

fix(workstream-c): address CodeRabbit review - list validation and la…

f8472d4

…bel sentinel fix

shreeshtripurwarcomp23-coder marked this pull request as draft June 15, 2026 05:06

shreeshtripurwarcomp23-coder marked this pull request as ready for review June 15, 2026 05:10

shreeshtripurwarcomp23-coder marked this pull request as draft June 15, 2026 05:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workstream-c): add cheatsheet categorizer and grouping#934

feat(workstream-c): add cheatsheet categorizer and grouping#934
shreeshtripurwarcomp23-coder wants to merge 2 commits into
OWASP:mainfrom
shreeshtripurwarcomp23-coder:workstream-c-clean

shreeshtripurwarcomp23-coder commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shreeshtripurwarcomp23-coder commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading