You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for putting this up @EgorKraevTransferwise , I would like to help unblock it.
I am am working in the FLAML codebase concurrently and was looking at how its
search space API gets consumed downstream, which led me here. A few things
worth raising before any further code work, since the PR is blocked at a
direction question that never got answered in 2022.
In #557 you asked for an
optimizer-agnostic spec so users could plug in hyperopt as easily as FLAML.
The PR as filed is FLAML-specific. Before Egor invests more in this branch,
is your guidance still optimizer-agnostic? Two viable shapes:
Keep it FLAML shaped now, accept the tight coupling, document that the
API is provisional (e.g., _search_space rather than search_space).
Optimizer agnostic spec — return a plain dict like {"n_estimators": {"type": "randint", "low": 2, "high": 200}, ...} and
ship a small econml.tuning.flaml_adapter translator. More work but
matches your 2022 ask and avoids hard-pinning FLAML.
If (2), the PR should move the API onto a mixin / base class so it lands
once and gets inherited by LinearDML, CausalForestDML, DRLearner,
etc., rather than being defined per-estimator.
Must fix correctness (regardless of direction)
tune is not imported in the diff.econml/dr/_drlearner.py uses tune.loguniform, tune.randint, etc., but the patch contains no from flaml import tune. The classmethods would raise NameError on
first call. (Whatever direction you take above, the import must land
somewhere.)
max_samples: tune.uniform(0, 0.5). sklearn's max_samples requires
the open interval (0, 1]; sampling exactly 0 will raise. Suggest tune.uniform(0.05, 0.95) (or 0.05, 0.5 if there's a reason to cap at
half — but justify the cap, sklearn's own default is None ≈ 1.0).
Patch targets setup.cfg. EconML migrated to pyproject.toml (see #1538-era cleanup); setup.cfg no longer drives dependency resolution. Move the FLAML add
into pyproject.toml's dependencies (or [project.optional-dependencies]
under a tuning extra — see point 4).
Hard FLAML dependency. Your own June 2022
reply
promised wrapping the import via econml.utilities.MissingModule to keep
FLAML optional. The PR currently makes it a hard install_requires. With
FLAML's transitive footprint (xgboost, lightgbm, sklearn pins), making it
mandatory will degrade install ergonomics for users who don't tune.
Either MissingModule + lazy import, or move FLAML under a tuning
extra.
Stale base + merge conflict. PR opens against ac12f54f (June 2022);
GitHub reports mergeable_state: dirty. _drlearner.py has had
substantial churn since (notably the model_selection-based 'auto'
nuisance refactor). Needs rebase before further review.
No tests. A new public classmethod exposed on ForestDRLearner with
no coverage. Suggest minimal: instantiate, call cls.search_space(),
sample one config from each domain, pass it through cls(**search_start)
to confirm every key is a valid __init__ arg.
Design feedback
min_impurity_decrease: tune.uniform(0, 10). Upper bound of 10 is far
outside the practical range — sklearn's default is 0.0, and values
above ~0.1 typically prune trees to stumps. Suggest tune.loguniform(1e-4, 1e-1)
or tune.uniform(0, 0.1). Worth comparing against ranges used in
established AutoML libs (auto-sklearn, optuna's sklearn integration).
honest: tune.choice([0, 1]) vs search_start: honest: True.
Type mismatch — sampler returns int, start returns bool. EconML
passes the value through to a constructor expecting bool; mostly OK
via Python's truthiness, but cleaner as tune.choice([False, True]).
subforest_size: tune.randint(2, 10) searched independently of n_estimators. ForestDRLearner expects n_estimators % subforest_size == 0
(see RegressionForest); independent sampling will hit invalid configs.
Either constrain via a conditional, or document the invariant and let
the constructor raise.
Commented-out lines.mc_iters and max_depth carry your "is this
worth searching?" / "is this the right syntax?" notes. Decide before
merge:
mc_iters: typically a fidelity dial, not a tuning knob — leave out.
max_depth: FLAML's tune.choice([None, tune.randint(2, 1000)]) is
not valid (tune.choice expects concrete values). Use a single tune.randint(2, 50) and accept that the unbounded None case isn't
searched (or use a sentinel like 0 and translate).
Smaller things
search_start: n_estimators: 100 vs. constructor default 1000. Reasonable
for a search starting point (faster), but worth a one-line docstring
saying "intentionally lower than the model default to keep search budgets
bounded."
tune.choice(["auto", "sqrt", "log2", None]) — "auto" is sklearn-1.3-deprecated
for RandomForestRegressor, but EconML's RegressionForest may still
accept it (it routes to n_features per the docstring). Worth a quick
test that all four values still work end-to-end under the project's
current sklearn pin (>=1.0,<1.9).
Offer
If direction (1) or (2) gets confirmed, happy to take the rebase + the
items above as a follow up PR (with credit to the original work) or
leave it with you, whichever you prefer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adressing #557
Just did ForestDRLearner to see if you agree with the way I'm doing it, will add the others if so.
Is there a reason you don't support dowhy versions from 0.8?