Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

soften monotonicity constraint #70

Open
wsdewitt opened this issue Jul 8, 2020 · 4 comments
Open

soften monotonicity constraint #70

wsdewitt opened this issue Jul 8, 2020 · 4 comments
Assignees

Comments

@wsdewitt
Copy link
Contributor

wsdewitt commented Jul 8, 2020

In 1D GE models, a monotonic I-spline is used to map from the latent space to the output space. In a GGE model, the moral equivalent of this is to stipulate each output dimension in y is monotonic in its corresponding latent space dimension in z.

The current monotonicity implementation projects all the weights in g(z) to the non-negative orthant after each gradient step*, which is sufficient for monotonicity, but not necessary, and probably a stronger condition than we want. The downside is that it cannot accommodate tradeoffs: every directional derivative is positive, so there are no directions in feature space that increase binding at the expense of folding (or vice versa).

A weaker condition is to stipulate that the diagonal elements of the Jacobian are non-negative: ∂y_i/∂z_i ≥ 0. This keeps the biophysical interpretation of the latent space dimensions intact, but allows phenotype trade-offs.

It's not immediately obvious to me how to implement this; it's not a simple box constraint. Maybe it could be done as a soft penalty: relu(-∂y_1/∂z_1) + relu(-∂y_2/∂z_2).

*As a side issue, Erick noticed that the projection happens before the gradient step, which is probably a bug:

param.data.clamp_(0)
optimizer.step()

@matsen
Copy link
Contributor

matsen commented Jul 8, 2020

My brain's full right now, but I note that we do have access to gradients during training, so we could muck with them before doing a gradient step.

@wsdewitt
Copy link
Contributor Author

wsdewitt commented Jul 8, 2020

It might be easy to implement this in an architecture where there are distinct g_bind and g_fold networks, as proposed in #53. The intra-g weights could be clamped >= 0, but the sparse inter-g weights could be unconstrained.

Another thing to note is that, even for 1D monotonicity, clamping to positive weights is sufficient but not necessary, and limits expressiveness even within the space of monotonic functions. This paper proposes modeling the derivative of a monotonic function with a non-negative neural network, and using quadrature to evaluate the monotonic function (defined as the network's antiderivative). This seems quite analogous to how monotonic I-splines are defined as integrals in a non-negative M-spline basis.

image

(As a cute technical detail, you get to use Fenyman's trick for backprop)

@matsen
Copy link
Contributor

matsen commented Jul 9, 2020

Yes, you're totally right. That paper looked cool, but more than we need. Remember that we're getting quite nice performance in one dimension with 25 monotonic hardtanh's!

But I like the idea of combining this with #53 the best.

@wsdewitt
Copy link
Contributor Author

wsdewitt commented Jul 30, 2020

Another approach for monotonicity is to simply penalize violations of it, as in nearly isotonic regression. From Hastie, Tibs, and Wainwright:

image

image

Note: the underscore "+" notation is the positive part (essentially a relu)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants