Hc/fix impossible schemes by HenryCrosswell · Pull Request #43 · neuroinformatics-unit/OSCaR

HenryCrosswell · 2026-06-08T14:18:54Z

Before submitting a pull request (PR), please read the contributing guide.

Please fill out as much of this template as you can, but if you have any problems or questions, just leave a comment and we will help out :)

Description

What is this PR

Bug fix
Addition of a new feature
Other

Why is this PR needed?
Since the PyRAT db is manually populated, there is a possibility that someone could have incorrectly entered genotypes for the offspring.

What does this PR do?
This adds a function to standardise.py that checks whether the offsprings genotype is a possible combination of the two parents, using the Mendelian ratios.

References

closes issue #10

How has this PR been tested?

Added new test to test_standardise, along with new test data which includes a raw file with impossible schemes, and a standardised file that asserts that those schemes have been removed.

Is this a breaking change?

No

Does this PR require an update to the documentation?

Not yet.

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

K-Meech

Thanks @HenryCrosswell - this is looking great! Your additions are filtering the impossible schemes really nicely, which will stop any incorrect data making it to the later processing stages.

I've put a few comments below - most are minor wording suggestions, but the main one is we need to make sure un-genotyped individuals can pass through the standardise process without erroring.

K-Meech · 2026-06-12T10:23:21Z

        pooch_data_path("standardised-data-forbidden-genotypes.csv")
    )

+    with pytest.raises(TypeError):


By adding with pytest.raises(TypeError), this test is allowed to pass even though it throws an error during standardise_pyrat_csv and never reaches pd.testing.assert_frame_equal. This means it is no longer testing that genotypes are standardised correctly.

You'll need to remove the with pytest.raises(TypeError) and rather add a fix in _remove_impossible_breeding_schemes instead. Otherwise, at the moment, any ungenotyped individuals that enter standardise_pyrat_csv will cause the processing to stop early (we need these ungenotyped individuals later in the historical stats part - see this issue: #8)

I have addressed this issues, however it has removed an impossible scheme from the impossible genotype dataset. Causing issues in this tests - test_handling_ungenotyped_individuals_in_stats. I attempted to fix by removing and then fixing the standardised offspring. However it caused a tangle of issues. So I have left the test_data as is, leaving only the dataframe missmatch.
[left]: (8, 11)
[right]: (9, 11)

Thanks @HenryCrosswell - I've updated ID_011 and the corresponding tests on this PR: #50 and on GIN. Once that is merged, you should be able to merge main into this branch and GIN's master into your GIN branch and hopefully the tests will pass.

@HenryCrosswell #50 is merged now, so could you update your branches as mentioned above? If you open a PR on GIN I can take a quick look over that and merge it 👍

K-Meech · 2026-06-12T13:59:51Z

+    genotype_father = standardised_df_row["genotype_father"]
+    genotype_mother = standardised_df_row["genotype_mother"]
+    genotype_offspring = standardised_df_row["genotype_offspring"]
+


To keep rows with un-genotyped offspring, you could add something like this here:

# If the offspring are un-genotyped, we keep the row as there is no way # of checking if the breeding scheme was valid if pd.isna(genotype_offspring): return pop

Hopefully this should make the test_standardise_genotypes test pass, although I see that there is one remaining impossible breeding scheme in the test file I missed. You can change: pyrat-data-forbidden-genotypes.csv / standardised-data-forbidden-genotypes.csv ID-011 genotype_mother to wt_het on your GIN branch.

HenryCrosswell · 2026-06-22T10:58:52Z

Tackled all your comments, although i did it within vscode, so there might be some small differences - let me know if you want me to change anything else.

Henry Crosswell and others added 5 commits June 3, 2026 14:53

Added a function for removing impossible breeding schemes

e37fba9

Updating branch to match main

4426557

added typeerror catch within class to fix standardised genotypes test

a856977

added pooch hashes for forbidden schemes

71ff347

added test function for row deletion of impossible scheme

2cfe8c8

HenryCrosswell requested a review from a team June 8, 2026 14:18

HenryCrosswell self-assigned this Jun 8, 2026

updated pooch path to remove master

ddef457

K-Meech requested changes Jun 12, 2026

View reviewed changes

Henry Crosswell added 4 commits June 22, 2026 10:23

merge with main to include name change to oscar colony

06c12cf

removed type error as it would skip ungenotyped

0ed5ca1

Added comment to impossible breeding scheme to help understanding

3ff8c71

implemented naming convention and docstring feedback

e0a46ad

K-Meech mentioned this pull request Jul 1, 2026

Remove remaining impossible scheme in test data #50

Merged

7 tasks

updated pooch_registry and stats to fix tests

759d4d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hc/fix impossible schemes#43

Hc/fix impossible schemes#43
HenryCrosswell wants to merge 11 commits into
mainfrom
hc/fix_impossible_schemes

HenryCrosswell commented Jun 8, 2026

Uh oh!

K-Meech left a comment

Uh oh!

K-Meech Jun 12, 2026

Uh oh!

HenryCrosswell Jun 22, 2026

Uh oh!

K-Meech Jun 29, 2026

Uh oh!

K-Meech Jul 1, 2026

Uh oh!

K-Meech Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HenryCrosswell commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

HenryCrosswell commented Jun 8, 2026

Description

References

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist:

Uh oh!

K-Meech left a comment

Choose a reason for hiding this comment

Uh oh!

K-Meech Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

HenryCrosswell Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

K-Meech Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

K-Meech Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

K-Meech Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HenryCrosswell commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants