Skip to content

Fix repeated callback iterations (#635)#662

Open
ugbotueferhire wants to merge 1 commit intointerpretml:mainfrom
ugbotueferhire:fix/callback-repeated-iterations
Open

Fix repeated callback iterations (#635)#662
ugbotueferhire wants to merge 1 commit intointerpretml:mainfrom
ugbotueferhire:fix/callback-repeated-iterations

Conversation

@ugbotueferhire
Copy link
Copy Markdown
Contributor

Fixes #635 Repeated Callback Iterations

Description

This PR resolves an issue during EBM training where the callback parameter is invoked repeatedly with the exact same n_steps value whenever the boosting loop fails to make progress.

The Bug

In the boosting loop inside _boost.py, the callback was being unconditionally invoked at the very end of the loop, regardless of whether make_progress was True or False. Because step_idx is only incremented when make_progress is True, non-progressing cycle iterations continuously spammed the callback with the exact same n_steps value, resulting in repeating redundant output logs for the user.

The Fix

I moved the callback invocation inside the if make_progress: block, right after early stopping tolerance evaluations.

-                        if callback is not None:
-                            is_done = callback(
-                                bag_idx, step_idx, make_progress, cur_metric
-                            )
-                            if is_done:
-                                if stop_flag is not None:
-                                    stop_flag[0] = True
-                                break
+                            if callback is not None:
+                                is_done = callback(
+                                    bag_idx, step_idx, make_progress, cur_metric
+                                )
+                                if is_done:
+                                    if stop_flag is not None:
+                                        stop_flag[0] = True
+                                    break

This change strictly guarantees that the callback only receives monotonic, progressing n_steps values.

Design Decisions

  • The has_progressed parameter has been strictly maintained for API backward compatibility. Because the callback is now only ever fired after progress is made, it will always be evaluated as True. This ensures any pre-existing user callbacks utilizing if has_progressed: do not unexpectedly break.

Testing

Added 5 regression tests inside a new test_callback.py file to maintain the invariant:

  1. test_callback_no_repeated_steps_classifier: Verifies n_steps remains monotonically increasing across distinct boosting phases (main terms vs interaction pairs) without any duplicate step integers.
  2. test_callback_no_repeated_steps_regressor: Equivalent functionality verification for the regressor implementation.
  3. test_callback_has_progressed_always_true: Validates has_progressed correctly always outputs True.
  4. test_callback_early_termination: Assertions establishing that is_done = True continues to properly bypass early stopping settings to instantly conclude training.
  5. test_callback_receives_valid_metrics: Ensures the best_score metric received by the callback does not evaluate to NaN or infinite sequences.
    All existing and newly included formatting validation and regression tests successfully pass without issue.

Signed-off-by: ugbotueferhire <ugbotueferhire@gmail.com>
@ugbotueferhire
Copy link
Copy Markdown
Contributor Author

@paulbkoch Hello again I’ve opened this PR to fix the repeated callback iteration issue (#635).

The callback now strictly fires inside the make_progress block, which completely eliminates the duplicate n_steps spam. Keeping API compatibility in mind, the has_progressed parameter remains in the signature and will always evaluate to True. I've also added 5 regression tests inside test_callback.py to ensure monotonic n_steps behavior and verify that early termination continues to work flawlessly.

Let me know if you’d like any tweaks made to the tests or if anything needs adjustments thanks.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.52%. Comparing base (ec085cc) to head (c3e6612).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #662      +/-   ##
==========================================
+ Coverage   66.39%   66.52%   +0.13%     
==========================================
  Files          75       75              
  Lines       11589    11589              
==========================================
+ Hits         7694     7710      +16     
+ Misses       3895     3879      -16     
Flag Coverage Δ
bdist_linux_311_python 66.27% <100.00%> (+0.13%) ⬆️
bdist_linux_312_python 66.26% <100.00%> (+0.12%) ⬆️
bdist_linux_313_python 66.27% <100.00%> (+0.15%) ⬆️
bdist_linux_314_python 66.16% <100.00%> (+0.10%) ⬆️
bdist_linuxarm_311_python 66.26% <100.00%> (+0.12%) ⬆️
bdist_linuxarm_312_python 66.26% <100.00%> (+0.11%) ⬆️
bdist_linuxarm_313_python 66.29% <100.00%> (+0.13%) ⬆️
bdist_linuxarm_314_python 66.19% <100.00%> (+0.12%) ⬆️
bdist_mac_311_python 66.42% <100.00%> (+0.12%) ⬆️
bdist_mac_312_python 66.42% <100.00%> (+0.12%) ⬆️
bdist_mac_313_python 66.44% <100.00%> (+0.13%) ⬆️
bdist_mac_314_python 66.36% <100.00%> (+0.13%) ⬆️
bdist_win_311_python 66.42% <100.00%> (+0.10%) ⬆️
bdist_win_312_python 66.44% <100.00%> (+0.13%) ⬆️
bdist_win_313_python 66.45% <100.00%> (+0.13%) ⬆️
bdist_win_314_python 66.38% <100.00%> (+0.13%) ⬆️
sdist_linux_311_python 66.22% <100.00%> (+0.13%) ⬆️
sdist_linux_312_python 66.22% <100.00%> (+0.13%) ⬆️
sdist_linux_313_python 66.20% <100.00%> (+0.12%) ⬆️
sdist_linux_314_python 66.13% <100.00%> (+0.12%) ⬆️
sdist_linuxarm_311_python 66.21% <100.00%> (+0.13%) ⬆️
sdist_linuxarm_312_python 66.21% <100.00%> (+0.12%) ⬆️
sdist_linuxarm_313_python 66.22% <100.00%> (+0.12%) ⬆️
sdist_linuxarm_314_python 66.15% <100.00%> (+0.15%) ⬆️
sdist_mac_311_python 66.35% <100.00%> (+0.15%) ⬆️
sdist_mac_312_python 66.35% <100.00%> (+0.13%) ⬆️
sdist_mac_313_python 66.33% <100.00%> (+0.12%) ⬆️
sdist_mac_314_python 66.26% <100.00%> (+0.12%) ⬆️
sdist_win_311_python 66.45% <100.00%> (+0.13%) ⬆️
sdist_win_312_python 66.44% <100.00%> (+0.13%) ⬆️
sdist_win_313_python 66.44% <100.00%> (+0.12%) ⬆️
sdist_win_314_python 66.38% <100.00%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@paulbkoch
Copy link
Copy Markdown
Collaborator

Hi @ugbotueferhire -- Thanks for this PR. I think I'd like to take this API in a slightly different direction though. The issue is that n_steps encodes valuable information, which is the number of boosting steps that have been taken, and that's useful in terms of evaluating overfitting. Changing it to be a monotonic increasing value eliminates that signal. The real problem I've come to realize is that we actually have two types of callbacks that we're invoking here: a callback to indicate boosting progress, and a callback to indicate that a feature was examined to calculate its gain and the potential update associated with that gain. But we only have 1 callback parameter, and I don't want to pollute the primary API with tons of different callback parameters. I think the better API change will be to change the callback parameter such that it is required to use named arguments in the signature. We can then introspect the function passed to us and determine what kind of callback we were given. Then, after that change is made, we can break the callbacks into two separate kinds of callbacks with different parameter names, and call them independently. The different callbacks can then be passed in as tuples via the callback parameter.

@ugbotueferhire
Copy link
Copy Markdown
Contributor Author

Hi @ugbotueferhire -- Thanks for this PR. I think I'd like to take this API in a slightly different direction though. The issue is that n_steps encodes valuable information, which is the number of boosting steps that have been taken, and that's useful in terms of evaluating overfitting. Changing it to be a monotonic increasing value eliminates that signal. The real problem I've come to realize is that we actually have two types of callbacks that we're invoking here: a callback to indicate boosting progress, and a callback to indicate that a feature was examined to calculate its gain and the potential update associated with that gain. But we only have 1 callback parameter, and I don't want to pollute the primary API with tons of different callback parameters. I think the better API change will be to change the callback parameter such that it is required to use named arguments in the signature. We can then introspect the function passed to us and determine what kind of callback we were given. Then, after that change is made, we can break the callbacks into two separate kinds of callbacks with different parameter names, and call them independently. The different callbacks can then be passed in as tuples via the callback parameter.

alright understood

@ugbotueferhire
Copy link
Copy Markdown
Contributor Author

@paulbkoch
Before I start on this, I have a few questions to make sure I get the design exactly right:

  1. What exact parameter names do you want for each callback type?
  • For example, for the progress callback: def progress_cb(*, bag_idx, n_steps, best_score):?
  • And for the examination callback: def exam_cb(*, bag_idx, n_steps, term_idx, avg_gain):? Or different names?
  1. The current callback uses positional args (bag_idx, n_steps, has_progressed, best_score). Should we keep supporting the old positional-arg style during a deprecation period, or is a clean break acceptable?
  2. You mentioned eventually passing tuples like callback=(progress_cb, exam_cb). Should I implement tuple support in this PR, or just the keyword-arg introspection for now and leave tuples for a follow-up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Repeated iterations in callback

2 participants