Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "a" parameter to softplus() #83 #85

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

DominiqueMakowski
Copy link

Following up the issues related to an exp link-function (TuringLang/Turing.jl#2310), it reinforced the idea that a softplus link could actually be a good alternative. However, I feel like implementing its generalized version (#83) would be key (useful when modelling small parameters), so here my shot at it.

image

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how widely use this variant is (and whether there are other commonly used alternatives, the issue mentions also Liu and Ferber 2016?). If it's added, we should make to sure to test it and to also add support for it in the ChainRules, InverseFunctions, and ChangesOfVariables extensions.

src/basicfuns.jl Outdated
@@ -165,9 +165,14 @@ Return `log(1+exp(x))` evaluated carefully for largish `x`.
This is also called the ["softplus"](https://en.wikipedia.org/wiki/Rectifier_(neural_networks))
transformation, being a smooth approximation to `max(0,x)`. Its inverse is [`logexpm1`](@ref).

The generalized `softplus` function (Wiemann et al., 2024) takes an additional optional parameter `a` that control
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there exist earlier references for this function?

Copy link
Author

@DominiqueMakowski DominiqueMakowski Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through Liu and Farber to double-check

From my understanding (ML is not my field), they validate "noisy softplus" as an improvement over other activation functions for neurons in NNs.

image

However, it seems like they named the a parameter sigma. Their plot looks similar but different in terms of values (?)

image

I'm not sure how widely use this variant

I share your concern here, I'm also careful not to add niche features to such a base package and add maintaining burden.
I can't say how commonly the generalized version is already used, its development seems fairly recent.
However, I can see its usefulness in quite a lot of cases: the default softplus only becomes close to identity after x > 2, and from experience we often do model parameters smaller than that (typical sigmas in neuroscience/psychology are like between 0 and 1), so using adjusted softplus links would make sense in these contexts. I suppose it's a tradeoff between the complexity of the feature and its (potential) usage

src/basicfuns.jl Outdated Show resolved Hide resolved
src/basicfuns.jl Outdated Show resolved Hide resolved
src/basicfuns.jl Outdated Show resolved Hide resolved
Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the remaining items here are:

  • Include the new docstrings in the documentation
  • Add tests for softplus and invsoftplus
  • Add support for InverseFunctions for softplus and invsoftplus and test it
  • Add support for ChangesOfVariables for softplus and invsoftplus and test it

I think ChainRules support should not be needed since log1pexp and log1mexp are already supported, and we can expect AD to differentiate through the remaining parts of the functions.

src/basicfuns.jl Outdated Show resolved Hide resolved
activate.jl Outdated Show resolved Hide resolved
src/basicfuns.jl Show resolved Hide resolved
@DominiqueMakowski
Copy link
Author

Add support for InverseFunctions / ChangesOfVariables

Can you clarify?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants