Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining populations drops EBVs if one population does not have them #191

Open
gregorgorjanc opened this issue Jul 16, 2024 · 2 comments
Open

Comments

@gregorgorjanc
Copy link
Contributor

Describe the bug

When we combine populations we loose EBVs if one of the population does not have them.

Steps To Reproduce

Here is a setup of an example:

library(AlphaSimR)
founderGenomes = quickHaplo(nInd = 3, nChr = 1, segSites = 10)
SP = SimParam$new(founderGenomes)
SP$addTraitA(nQtlPerChr = 10)

pop = newPop(founderGenomes)
pop = setPheno(pop, varE = 1)
pop@ebv = pop@pheno

pop2 = newPop(founderGenomes)

First, let's look at genetic values:

> c(pop, pop2)@gv
         Trait1
[1,]  0.7139849
[2,]  0.7002063
[3,] -1.4141912
[4,]  0.7139849
[5,]  0.7002063
[6,] -1.4141912

OK, everyone has genetic value.

Now phenotypes, in the example these are present in one but not the other population:

> c(pop, pop2)@pheno
         Trait1
[1,]  0.9140245
[2,]  1.6257168
[3,] -2.1727727
[4,]         NA
[5,]         NA
[6,]         NA

OK, NAs as one might expect.

Finally, EBVs:

> c(pop, pop2)@ebv
    
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]

Expected behavior

While working with NA's is a pain on it's own, I expected that c(pop, pop2)@pheno and c(pop, pop2)@ebv would behave the same.

Additional context

Happy to match behaviour of c(pop, pop2)@ebv to c(pop, pop2)@pheno if we agree that this is indeed what we want;)

@gaynorr
Copy link
Owner

gaynorr commented Jul 16, 2024

The current behavior is to use rbind if the number of columns match and to empty out the slot if they don't. This is designed to account for the fact that there are no controls on the number of columns in the ebv slot.

I'm not currently looking at the names of the columns (they used to not have names). I should include these column names in the merge to make sure that they are consistent between populations.

I'm not so sure that it will be good idea to keep columns if they don't match across populations. It would be possible to add in NA for missing columns, but this might be a source of bugs in user scripts. For example, a lot of AlphaSimR is based on specifying numbered columns and the numbers for specific columns wouldn't necessarily be consistent if I were to run a merge like this.

@gregorgorjanc
Copy link
Contributor Author

In my view, I would throw an error if number of columns does not match across gv, pheno, and ebv with a provision that ebv could be empty (which is how pheno is handled). In other words, I think it would make sense for behaviour around ebv slot to match exactly what happens with the pheno slot. Best to have consistency.

I agree that with column names it now becomes easier to handle this situation, but as long as the code is mostly/only working with column numbers and not column names, we can not rely on column names just now (some future majors update could) and hence have to have consistent column number AND column order and I would vote to fill in missing parts with NA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants