Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "a" parameter to softplus() #83 #85

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
25 changes: 21 additions & 4 deletions src/basicfuns.jl
Original file line number Diff line number Diff line change
Expand Up @@ -165,9 +165,14 @@ Return `log(1+exp(x))` evaluated carefully for largish `x`.
This is also called the ["softplus"](https://en.wikipedia.org/wiki/Rectifier_(neural_networks))
transformation, being a smooth approximation to `max(0,x)`. Its inverse is [`logexpm1`](@ref).

The generalized `softplus` function (Wiemann et al., 2024) takes an additional optional parameter `a` that control
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there exist earlier references for this function?

Copy link
Author

@DominiqueMakowski DominiqueMakowski Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through Liu and Farber to double-check

From my understanding (ML is not my field), they validate "noisy softplus" as an improvement over other activation functions for neurons in NNs.

image

However, it seems like they named the a parameter sigma. Their plot looks similar but different in terms of values (?)

image

I'm not sure how widely use this variant

I share your concern here, I'm also careful not to add niche features to such a base package and add maintaining burden.
I can't say how commonly the generalized version is already used, its development seems fairly recent.
However, I can see its usefulness in quite a lot of cases: the default softplus only becomes close to identity after x > 2, and from experience we often do model parameters smaller than that (typical sigmas in neuroscience/psychology are like between 0 and 1), so using adjusted softplus links would make sense in these contexts. I suppose it's a tradeoff between the complexity of the feature and its (potential) usage

the approximation error with respect to the linear spline. It defaults to `a=1.0`, in which case the softplus is
equivalent to `log1pexp`.

DominiqueMakowski marked this conversation as resolved.
Show resolved Hide resolved
See:
* Martin Maechler (2012) [“Accurately Computing log(1 − exp(− |a|))”](http://cran.r-project.org/web/packages/Rmpfr/vignettes/log1mexp-note.pdf)
"""
* Wiemann, P. F., Kneib, T., & Hambuckers, J. (2024). Using the softplus function to construct alternative link functions in generalized linear models and beyond. Statistical Papers, 65(5), 3155-3180.
"""
DominiqueMakowski marked this conversation as resolved.
Show resolved Hide resolved
log1pexp(x::Real) = _log1pexp(float(x)) # ensures that BigInt/BigFloat, Int/Float64 etc. dispatch to the same algorithm

# Approximations based on Maechler (2012)
Expand Down Expand Up @@ -255,10 +260,22 @@ Return `log(exp(x) - 1)` or the “invsoftplus” function. It is the inverse o
[`log1pexp`](@ref) (aka “softplus”).
"""
logexpm1(x::Real) = x <= 18.0 ? log(_expm1(x)) : x <= 33.3 ? x - exp(-x) : oftype(exp(-x), x)
logexpm1(x::Float32) = x <= 9f0 ? log(expm1(x)) : x <= 16f0 ? x - exp(-x) : oftype(exp(-x), x)
logexpm1(x::Float32) = x <= 9.0f0 ? log(expm1(x)) : x <= 16.0f0 ? x - exp(-x) : oftype(exp(-x), x)
DominiqueMakowski marked this conversation as resolved.
Show resolved Hide resolved

function softplus(x; a::Real=1.0)
if a == 1.0
return log1pexp(x)
end
return log1pexp(a * x) / a
end

function invsoftplus(y; a::Real=1.0)
if a == 1.0
return logexpm1(y)
end
return logexpm1(a * y) / a
end
DominiqueMakowski marked this conversation as resolved.
Show resolved Hide resolved

const softplus = log1pexp
const invsoftplus = logexpm1

"""
$(SIGNATURES)
Expand Down
Loading