[R-package] ensure use of interaction_constraints does not lead to features being ignored #6377

mayer79 · 2024-03-20T15:43:52Z

This enhances the R-API of interaction constraints by adding a feature group with those features that do not appear in any of the interaction groups. Currently, these are simply dropped from training, which seems undesirable.

Additionally, it reorganizes the code of the corresponding helper function .check_interaction_constraints().

It solves the R-package part of #6376. I will attempt a separate PR for the Python-package.

Example

.check_interaction_constraints(list("a", 2:3), letters[1:5])

# Output
[[1]]
[1] "[0]"

[[2]]
[1] "[1,2]"

[[3]]
[1] "[3,4]"

Without the PR, the result is

[[1]]
[1] "[0]"

[[2]]
[1] "[1,2]"

i.e., the last two features are silently dropped from the training.

mayer79 · 2024-03-21T10:24:43Z

Getting a failing unit test:

  -- 1. Failure (test_lgb.Booster.R:209:5): Loading a Booster from a text file wor

  bst2$params[names(params)] not equal to `params`.

  Component "interaction_constraints": Length mismatch: comparison on first 2 components

I will have a look at it next week (afk).

The test used incomplete interaction constraints. Since the new functionality will add missing features to the list of interaction constraint vectors, the test failed. Now, the test uses completely specified constraints.

… are added as own group.

jameslamb

Thanks! Left a few minor comments.

@ me if you need help with the failing tests. Let's also please get confirmation that excluding omitted features from training wasn't intentional (#6376 (comment)).

R-package/tests/testthat/test_basic.R

R-package/R/utils.R

Co-authored-by: James Lamb <jaylamb20@gmail.com>

…raints

jameslamb · 2024-03-27T14:28:43Z

Hey @mayer79 , across your recent PRs I've seen multiple "fix linting" types of commits. Totally fine to keep using Continuous Integration to get that feedback (we don't have a lot of activity going on in the repo right now), but you'd probably find it faster to run the linting locally. It only requires R and the {lintr} package.

Rscript .ci/lint_r_code.R $(pwd)/R-package

mayer79 · 2024-03-27T15:22:04Z

Hey @mayer79 , across your recent PRs I've seen multiple "fix linting" types of commits. Totally fine to keep using Continuous Integration to get that feedback (we don't have a lot of activity going on in the repo right now), but you'd probably find it faster to run the linting locally. It only requires R and the {lintr} package.
Rscript .ci/lint_r_code.R $(pwd)/R-package

Thanks, this is the stuff that I should have asked long time ago, but never did :-).

mayer79 · 2024-05-14T10:27:53Z

Pipeline seems happy @jameslamb - but really no pressure :-)

jameslamb

Thanks very much for this! And I really appreciate you keeping it up to date with master while waiting for reviews.

I just left one suggestion about covering this a bit more thoroughly with tests. If you don't have time in the next week to get to that, let me know if you're open to me writing that test and pushing it here...I'm sorry for the time pressure (especially after not reviewing this for so long 😬 ), but I'm going to try to push for v4.4.0 to get out in the next week, ahead of the numpy 2.0 release (#6439 (comment)), and I'd love to get this change into it if we can.

jameslamb · 2024-06-10T04:22:04Z

R-package/tests/testthat/test_lgb.Booster.R

@@ -174,7 +174,7 @@ test_that("Loading a Booster from a text file works", {
        , bagging_freq = 1L
        , boost_from_average = FALSE
        , categorical_feature = c(1L, 2L)
-        , interaction_constraints = list(c(1L, 2L), 1L)
+        , interaction_constraints = list(1L:2L, 3L, 4L:ncol(train$data))


This PR doesn't currently have a test confirming that the new automatically-added grouped is actually added to the model.

The change here on this line is passing that group through params from the beginning, and the new test added in test_utils.R only checks that .check_interaction_constraints() adds that group to whatever's passed into it... not necessarily that the output of that function actually makes it all the way to LightGBM's C++ code and therefore affects training.

Could you please add a test that works roughly like the following?

passes something like interaction_constraints = list(3L, 1:2L) through params to lgb.train()

passing them out of order like that also might help catch some issues

checks that the interaction_constraints in $get_params() of that Booster produced looks correct (i.e. includes the automatically-added group)

checks that interaction_constraints parameter looks correct in the model string produced by $save_model_to_string()

That'd help improve our confidence that this is working end-to-end.

jameslamb · 2024-06-10T04:28:40Z

R-package/R/utils.R

+#         1. turns feature *names* into 1-based integer positions, then
+#         2. adds an extra list element with skipped features, then
+#         3. turns 1-based integer positions into 0-based positions, and finally
+#         4. collapses the values of each list element into a string like "[0, 1]".


This comment is FANTASTIC, made it very easy for me to understand the code. Thank you!

Improve .check_interaction_constraints()

4541fed

mayer79 requested review from guolinke, jameslamb, shiyu1994, jmoralez and borchero as code owners March 20, 2024 15:43

mayer79 marked this pull request as draft March 20, 2024 15:44

mayer79 added 2 commits March 20, 2024 16:49

fix remaining_indices

7e2fcc2

Replace is.list() by identical(class(.), "list")

490f61f

jameslamb added in progress fix labels Mar 20, 2024

Merge branch 'master' into r-api-interaction-constraints

5bd528c

mayer79 marked this pull request as ready for review March 21, 2024 08:39

Updated one unit test

ee26f69

mayer79 marked this pull request as draft March 21, 2024 10:28

Fix failing unit test

947a49c

The test used incomplete interaction constraints. Since the new functionality will add missing features to the list of interaction constraint vectors, the test failed. Now, the test uses completely specified constraints.

mayer79 marked this pull request as ready for review March 25, 2024 16:14

Add unit test to check if skipped features in interaction constraints…

6688745

… are added as own group.

jameslamb requested changes Mar 27, 2024

View reviewed changes

R-package/tests/testthat/test_basic.R Outdated Show resolved Hide resolved

R-package/R/utils.R Outdated Show resolved Hide resolved

jameslamb changed the title ~~Improve .check_interaction_constraints()~~ [R-package] ensure use of interaction_constraints does not lead to features being ignored Mar 27, 2024

mayer79 and others added 7 commits March 27, 2024 07:11

Fix enumeration in comments

61d689e

Co-authored-by: James Lamb <jaylamb20@gmail.com>

Changed the unit test on too large feature index in interaction const…

a26117d

…raints

More beautiful error message on bad features in interaction constraints

8bd1c49

Update error message in unit test on interaction constraints

0833b7e

Replace paste(, collapse) by toString()

c7902e7

stop expect_error() to recognize string as regular expression

c3485c1

Fix linting error

a00c5f8

mayer79 added 3 commits April 11, 2024 23:14

Merge branch 'master' into r-api-interaction-constraints

24e8b1c

Merge branch 'master' into r-api-interaction-constraints

d7c3df3

Merge branch 'master' into r-api-interaction-constraints

f6cce45

Merge branch 'master' into r-api-interaction-constraints

0265f96

jameslamb requested changes Jun 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-package] ensure use of interaction_constraints does not lead to features being ignored #6377

[R-package] ensure use of interaction_constraints does not lead to features being ignored #6377

mayer79 commented Mar 20, 2024 •

edited

mayer79 commented Mar 21, 2024 •

edited

jameslamb left a comment

jameslamb commented Mar 27, 2024

mayer79 commented Mar 27, 2024

mayer79 commented May 14, 2024

jameslamb left a comment

jameslamb Jun 10, 2024

jameslamb Jun 10, 2024

[R-package] ensure use of interaction_constraints does not lead to features being ignored #6377

Are you sure you want to change the base?

[R-package] ensure use of interaction_constraints does not lead to features being ignored #6377

Conversation

mayer79 commented Mar 20, 2024 • edited

Example

mayer79 commented Mar 21, 2024 • edited

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Mar 27, 2024

mayer79 commented Mar 27, 2024

mayer79 commented May 14, 2024

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb Jun 10, 2024

Choose a reason for hiding this comment

jameslamb Jun 10, 2024

Choose a reason for hiding this comment

mayer79 commented Mar 20, 2024 •

edited

mayer79 commented Mar 21, 2024 •

edited