Skip to content

Hc/fix impossible schemes#43

Draft
HenryCrosswell wants to merge 11 commits into
mainfrom
hc/fix_impossible_schemes
Draft

Hc/fix impossible schemes#43
HenryCrosswell wants to merge 11 commits into
mainfrom
hc/fix_impossible_schemes

Conversation

@HenryCrosswell

Copy link
Copy Markdown

Before submitting a pull request (PR), please read the contributing guide.

Please fill out as much of this template as you can, but if you have any problems or questions, just leave a comment and we will help out :)

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?
Since the PyRAT db is manually populated, there is a possibility that someone could have incorrectly entered genotypes for the offspring.

What does this PR do?
This adds a function to standardise.py that checks whether the offsprings genotype is a possible combination of the two parents, using the Mendelian ratios.

References

closes issue #10

How has this PR been tested?

Added new test to test_standardise, along with new test data which includes a raw file with impossible schemes, and a standardised file that asserts that those schemes have been removed.

Is this a breaking change?

No

Does this PR require an update to the documentation?

Not yet.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@HenryCrosswell HenryCrosswell requested a review from a team June 8, 2026 14:18
@HenryCrosswell HenryCrosswell self-assigned this Jun 8, 2026

@K-Meech K-Meech left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HenryCrosswell - this is looking great! Your additions are filtering the impossible schemes really nicely, which will stop any incorrect data making it to the later processing stages.

I've put a few comments below - most are minor wording suggestions, but the main one is we need to make sure un-genotyped individuals can pass through the standardise process without erroring.

pooch_data_path("standardised-data-forbidden-genotypes.csv")
)

with pytest.raises(TypeError):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By adding with pytest.raises(TypeError), this test is allowed to pass even though it throws an error during standardise_pyrat_csv and never reaches pd.testing.assert_frame_equal. This means it is no longer testing that genotypes are standardised correctly.

You'll need to remove the with pytest.raises(TypeError) and rather add a fix in _remove_impossible_breeding_schemes instead. Otherwise, at the moment, any ungenotyped individuals that enter standardise_pyrat_csv will cause the processing to stop early (we need these ungenotyped individuals later in the historical stats part - see this issue: #8)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed this issues, however it has removed an impossible scheme from the impossible genotype dataset. Causing issues in this tests - test_handling_ungenotyped_individuals_in_stats. I attempted to fix by removing and then fixing the standardised offspring. However it caused a tangle of issues. So I have left the test_data as is, leaving only the dataframe missmatch.
[left]: (8, 11)
[right]: (9, 11)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HenryCrosswell - I've updated ID_011 and the corresponding tests on this PR: #50 and on GIN. Once that is merged, you should be able to merge main into this branch and GIN's master into your GIN branch and hopefully the tests will pass.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HenryCrosswell #50 is merged now, so could you update your branches as mentioned above? If you open a PR on GIN I can take a quick look over that and merge it 👍

genotype_father = standardised_df_row["genotype_father"]
genotype_mother = standardised_df_row["genotype_mother"]
genotype_offspring = standardised_df_row["genotype_offspring"]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep rows with un-genotyped offspring, you could add something like this here:

# If the offspring are un-genotyped, we keep the row as there is no way
# of checking if the breeding scheme was valid
if pd.isna(genotype_offspring):
    return pop

Hopefully this should make the test_standardise_genotypes test pass, although I see that there is one remaining impossible breeding scheme in the test file I missed. You can change: pyrat-data-forbidden-genotypes.csv / standardised-data-forbidden-genotypes.csv ID-011 genotype_mother to wt_het on your GIN branch.

Comment thread oscar_colony/colony_management/pyrat/standardise.py Outdated
Comment thread oscar_colony/colony_management/pyrat/standardise.py Outdated
Comment thread oscar_colony/colony_management/pyrat/standardise.py Outdated
Comment thread oscar_colony/colony_management/pyrat/standardise.py Outdated
@HenryCrosswell

Copy link
Copy Markdown
Author

Tackled all your comments, although i did it within vscode, so there might be some small differences - let me know if you want me to change anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants