Is your feature request related to a problem?
The TopicRelevanceOpenAI validator has hardcoded scoring instructions, leading to false positives for forbidden-topic configurations. This misalignment causes confusion when users configure the system prompt with excluded topics.
Describe the solution you'd like
- Update the scoring semantics to ensure high scores reflect "clearly NOT forbidden" for exclusion-based prompts.
- Revise the TopicRelevanceOpenAI validator to eliminate conflicts in scoring.
- Conduct testing to confirm the new scoring aligns with user intent.
Original issue
Which feature or component needs enhancement?
Clearly identify the existing feature/component that needs improvement.
The TopicRelevanceOpenAI validator had hardcoded scoring instructions that caused false positives for forbidden-topic configurations. When a user configured the system prompt with forbidden topics (e.g. "do not answer queries about gender detection"), the model was misled by the scoring semantics — score 3 meant "clearly in scope", which conflicted with the intent of an exclusion-based prompt where a high score should mean "clearly NOT forbidden".
Describe the current behavior
A clear description of how it currently works and what the limitations are.
Describe the enhancement you'd like
A clear and concise description of the improvement you want to see.
Why is this enhancement needed?
Explain the benefits (e.g., performance, usability, maintainability, scalability).
Additional context
Add any other context, metrics, screenshots, or examples about the enhancement here.
Is your feature request related to a problem?
The TopicRelevanceOpenAI validator has hardcoded scoring instructions, leading to false positives for forbidden-topic configurations. This misalignment causes confusion when users configure the system prompt with excluded topics.
Describe the solution you'd like
Original issue
Which feature or component needs enhancement?
Clearly identify the existing feature/component that needs improvement.
The TopicRelevanceOpenAI validator had hardcoded scoring instructions that caused false positives for forbidden-topic configurations. When a user configured the system prompt with forbidden topics (e.g. "do not answer queries about gender detection"), the model was misled by the scoring semantics — score 3 meant "clearly in scope", which conflicted with the intent of an exclusion-based prompt where a high score should mean "clearly NOT forbidden".
Describe the current behavior
A clear description of how it currently works and what the limitations are.
Describe the enhancement you'd like
A clear and concise description of the improvement you want to see.
Why is this enhancement needed?
Explain the benefits (e.g., performance, usability, maintainability, scalability).
Additional context
Add any other context, metrics, screenshots, or examples about the enhancement here.