Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence encodings and beta heatmaps #77

Open
wsdewitt opened this issue Jul 27, 2020 · 0 comments
Open

Sequence encodings and beta heatmaps #77

wsdewitt opened this issue Jul 27, 2020 · 0 comments
Assignees
Labels
invalid This doesn't seem right

Comments

@wsdewitt
Copy link
Contributor

The sequence encoding we are using is one-hot at each site. This is overparameterized because it includes indicator variables for the WT sequence. The encoding we use comes from dms_variants.binarymap.BinaryMap with expand=True.
As a result, the WT sequence is not represented by a sequence of zeros, so single-mutant variants are not represented by a sequence of zeros containing only one 1, as assumed in numpy_single_mutant_predictions .

Currently the beta coefficients are not interpretable as single mutant effects, since there are beta coefficients for the WT states too. My suggestion is to use an encoding that omits the redundant WT indicators, and instead model the WT latent score with a single bias parameter (like previous methods), rather than L weights (where L is sequence length). This will make a few tasks more straightforward:

  • plotting single mutant effects, and associating them with a single beta parameter
  • modeling the WT intercept for data given wrt WT
  • modeling interactions of variants (e.g. pairwise).
@wsdewitt wsdewitt added the invalid This doesn't seem right label Jul 27, 2020
@wsdewitt wsdewitt self-assigned this Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

1 participant