Fix repeated callback iterations (#635)#662
Fix repeated callback iterations (#635)#662ugbotueferhire wants to merge 1 commit intointerpretml:mainfrom
Conversation
Signed-off-by: ugbotueferhire <ugbotueferhire@gmail.com>
|
@paulbkoch Hello again I’ve opened this PR to fix the repeated callback iteration issue (#635). The callback now strictly fires inside the make_progress block, which completely eliminates the duplicate n_steps spam. Keeping API compatibility in mind, the has_progressed parameter remains in the signature and will always evaluate to True. I've also added 5 regression tests inside test_callback.py to ensure monotonic n_steps behavior and verify that early termination continues to work flawlessly. Let me know if you’d like any tweaks made to the tests or if anything needs adjustments thanks. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #662 +/- ##
==========================================
+ Coverage 66.39% 66.52% +0.13%
==========================================
Files 75 75
Lines 11589 11589
==========================================
+ Hits 7694 7710 +16
+ Misses 3895 3879 -16
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @ugbotueferhire -- Thanks for this PR. I think I'd like to take this API in a slightly different direction though. The issue is that n_steps encodes valuable information, which is the number of boosting steps that have been taken, and that's useful in terms of evaluating overfitting. Changing it to be a monotonic increasing value eliminates that signal. The real problem I've come to realize is that we actually have two types of callbacks that we're invoking here: a callback to indicate boosting progress, and a callback to indicate that a feature was examined to calculate its gain and the potential update associated with that gain. But we only have 1 callback parameter, and I don't want to pollute the primary API with tons of different callback parameters. I think the better API change will be to change the callback parameter such that it is required to use named arguments in the signature. We can then introspect the function passed to us and determine what kind of callback we were given. Then, after that change is made, we can break the callbacks into two separate kinds of callbacks with different parameter names, and call them independently. The different callbacks can then be passed in as tuples via the callback parameter. |
alright understood |
|
@paulbkoch
|
Fixes #635 Repeated Callback Iterations
Description
This PR resolves an issue during EBM training where the
callbackparameter is invoked repeatedly with the exact samen_stepsvalue whenever the boosting loop fails to make progress.The Bug
In the boosting loop inside
_boost.py, the callback was being unconditionally invoked at the very end of the loop, regardless of whethermake_progresswasTrueorFalse. Becausestep_idxis only incremented whenmake_progressisTrue, non-progressing cycle iterations continuously spammed the callback with the exact samen_stepsvalue, resulting in repeating redundant output logs for the user.The Fix
I moved the callback invocation inside the
if make_progress:block, right after early stopping tolerance evaluations.This change strictly guarantees that the callback only receives monotonic, progressing
n_stepsvalues.Design Decisions
has_progressedparameter has been strictly maintained for API backward compatibility. Because the callback is now only ever fired after progress is made, it will always be evaluated asTrue. This ensures any pre-existing user callbacks utilizingif has_progressed:do not unexpectedly break.Testing
Added 5 regression tests inside a new
test_callback.pyfile to maintain the invariant:test_callback_no_repeated_steps_classifier: Verifiesn_stepsremains monotonically increasing across distinct boosting phases (main terms vs interaction pairs) without any duplicate step integers.test_callback_no_repeated_steps_regressor: Equivalent functionality verification for the regressor implementation.test_callback_has_progressed_always_true: Validateshas_progressedcorrectly always outputsTrue.test_callback_early_termination: Assertions establishing thatis_done = Truecontinues to properly bypass early stopping settings to instantly conclude training.test_callback_receives_valid_metrics: Ensures thebest_scoremetric received by the callback does not evaluate to NaN or infinite sequences.All existing and newly included formatting validation and regression tests successfully pass without issue.