Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is there no activation function applied to the 1x1 conv that produces the dense output? #404

Open
chasep255 opened this issue May 24, 2021 · 0 comments

Comments

@chasep255
Copy link

I have been trying to understand why there is no activation function applied to the 1x1 conv that is used between the residual connections. From what I understand having a linear layer with no activation function does not really add to the expressive power of the model. The skip connections eventually have a relu applied so that does make sense to me. However, the linear output of the residual connections has no activation applied as far as I can tell. It is just added to the residual bus and fed into the next layer. What is the point of having the 1x1 convolution in this case? Why not just skip the 1x1 convolution and add the filter * gate directly to the inputs to create the dense output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant