Add ScoreELBO objective #72

arnauqb · 2024-07-19T13:33:21Z

This PR implements the ScoreELBO objective which allows to perform VI for non-differentiable densities.

I have followed the idea exposed here where we can substitute the tradiational ELBO objective by a surrogate one

$$ f_\phi(z) \rightarrow \log q_\phi (z) \overline{f_\phi(z)} $$

where the bar indicates that the term is constant (i.e., no gradient flows). The gradient of this surrogate objective is the same as the reparameterization one.

I have done the following modifications

Extended the stop_gradient function defined in AdvancedVI.jl for the ForwardDiff and Zygote backends. I am not sure how to implement them for Enzyme or ReverseDiff so it throws a not implemented error for now.
I also modified the value_and_gradient method for Zygote to properly test that the gradient does not flow. Instead of returning nothing it returns 0. Not sure if this is necessary or adequate.
I have added a new file objectives/elbo/scoreelbo.jl that implements the new objective. I made sure only to modify the backward pass, and keep the ELBO loss the same value.
I added tests copying the structure for the RepGradELBO objective. I had to relax some of the tolerances a bit given that the score estimator is typically much worse.
I have added a score example in the benchmarks folder. This is the comparison with the pathwise version:

Questions / things to consider:

Right now, we take the score gradient for the whole model, but ideally we would only use it for the individual densities where we can't differentiate. I thought that one way to do this would be to create custom differentiation rules for the non-differentiation densities and do the score estimator inside. However, for this we need to access log_prob(samples) inside the rule and I am not sure how to get them easily.
The score estimator can be greatly improved with proper variance reduction techniques. This does not implement any and so the estimator is relatively bad.

Red-Portal · 2024-07-20T06:34:15Z

Thank you so much for the PR. It looks pretty good at a glance, but I am currently on a trip to ICML, so it will take some time for me to do a more detailed review. A few first comments:

I think it would be better to change the name of ScoreELBO to ScoreGradELBO to be consistent with RepGradELBO.
I don't think applying STL would help in the case of the score gradient. (I also have not seen people do this.) But maybe it does? Have you run some experiments on this?
You can avoid relying on stop_gradient by passing the log target evaluation as an auxiliary input. This is, in fact, how STL is implemented (see q_stop here). It's not ideal, but I think that's the best we can do. In this case, it's acceptable since it wouldn't add any costs compared to using stop_gradient.
Additional variance reduction methods like control variates should be easy to add, but Rao-Blackwellization is probably going to be very hard to implement since DynamicPPL doesn't have a mechanism similar to Pyro plates. (I am all ears if anybody has better ideas for this.)
If we were to support the score gradient, we should probably add the VarGrad estimator too. (I guess this will be work for me in the future unless somebody would like to take a shot first.)

As for mixing the parameterization gradient with the score gradient, I think I have an idea about how to do it: we can split the variables into those that need the RP gradient and those that need the score gradient and compute the RP objective and the score objective separately. But it will require handling NamedTuple-variate random variables, which will require changes on DynamicPPL side. (Without NamedTuple-variate RVs, it's not going to be pretty.) Also, unfortunately, we probably can't handle things automatically as in Pyro since DynamicPPL doesn't track RV dependencies as far as I understand. (Is that correct @sunxd3 ?)

sunxd3 · 2024-07-20T09:25:19Z

I don't think applying STL would help in the case of the score gradient.

May I ask what is "STL"?

Also, unfortunately, we probably can't handle things automatically as in Pyro since DynamicPPL doesn't track RV dependencies as far as I understand.

Yep, DynamicPPL doesn't track dependencies, there were some efforts, but we just con't do it reliably. Although, JuliaBUGS.jl does, extracting dependencies is part of the motivation of writing JuliaBUGS -- It actually constructs a directed PGM. Maybe this is a good time to explore the design and implementation that exploits the graph structure.

arnauqb · 2024-07-20T10:04:19Z

Thanks @Red-Portal for the first comments, I agree with the name change and to avoid the use of stop_gradient will push changes soon.

This is a somewhat unrelated question but do you know what are the key differences between DynamicPPL.jl and GraphPPL.jl?. The reason I ask is because in GraphPPL.jl there is a very nice syntax to specify metadata for a specific sampling (see here):

@model function some_model(a, b)
    x ~ Beta(a, b) where { meta = "Hello, world!" }
end

this would be very convenient to specify the estimation method.

Red-Portal · 2024-07-21T21:21:28Z

Sorry for the delay!

@arnauqb I don't have too much insight into the PPL part so that's something that perhaps @sunxd3 could comment on.

@sunxd3 STL stands for sticking-the-landing, which is a very clever control variate method by Roeder, Wu, and Duvenaud.

sunxd3 · 2024-07-22T07:22:31Z

DynamicPPL aims to enable users to write generic Julia code for model definition. While Graph-oriented DSLs like GraphPPL (and JuliaBUGS) are bound to have some restrictions to the syntax, although the restrictions can be a good thing. These syntax restrictions usually manifested as limited mutability, limitations to loop bounds etc.

Other than syntax differences, Turing/DynamicPPL is also a "general purpose" PPL(probabilistic programming language). A somewhat reductionist understanding of "general purpose" language v.s. graphical model PPL can be: "general purpose" PPL can allow stochastic number of random variables in the model. A very basic circumstance for this to happen is when the size of some array representing random variables to be stochastic, e.g., drawn from a Categorical distribution.

The ability to specify metadata in GraphPPL is a good design, but maybe difficult for DynamicPPL to implement well (it would take work to get the design right).

Now, I think it's very interesting how Pyro implemented TraceELBO, but I would need further investigation to determine how it works and if it applies to DynamicPPL.

All these being said, we should probably think about how to use better dependency information to make VI better here. Both in terms of algorithm implementations and interface design. We can use these designs to inform DynamicPPL design in the future and also build interface with DSLs like GraphPPL.

arnauqb · 2024-07-23T11:10:17Z

So I'm not sure how to avoid using stop_gradient since I need to do the trick of (score_grad - ignore_gradient(score_grad)) so that the ELBO stays the same value as the repgrad case. The only way I can think to do it with the auxiliary variable would be compute logpdf twice, once outside the gradient calculation and passed through aux, and one inside. But that would add a bit of computation overhead?

Red-Portal · 2024-07-29T19:51:31Z

Sorry for the delay! I just got back from ICML and was recovering from a cold I caught on the way back. Yes, it seems like we'll have to call logpdf(q) twice, both in and out of the AD-path. I think it should be acceptable in general. I'll do a proper review once your updates are commited.

arnauqb · 2024-09-16T09:00:27Z

Apologies for the long delay on updating this. I got sidetracked with other projects after the summer break...

I have now implemented the score estimator without needing the stop_gradient trick by using q_stop as you suggested. I have also imlemented a baseline control variate, which computes the running mean of the last n iterations and uses it as baseline for variance reduction. It does have a very noticeable effect:

Red-Portal

Hi @arnauqb, thanks for the amazing work! It is nice to see that everything seems to work out without the need to gradient stopping operations. I have some rather minor questions and requested changes, which I think should be handled pretty quickly. I do have another comment: could you also try to add the test with Bijectors as done in here?

src/objectives/elbo/scoregradelbo.jl

Red-Portal · 2024-09-18T21:17:16Z

src/objectives/elbo/scoregradelbo.jl

+function compute_control_variate_baseline(history, window_size)
+    if length(history) == 0
+        return 1.0
+    end
+    min_index = max(1, length(history) - window_size)
+    return mean(history[min_index:end])
+end


I personally think that we should make a whole new set of interface for control variates so that people could easily mix and match various control variates. But for now, I think this could be included in the PR.

src/objectives/elbo/scoregradelbo.jl

ext/AdvancedVIZygoteExt.jl

src/objectives/elbo/scoregradelbo.jl

src/objectives/elbo/entropy.jl

test/Project.toml

Co-authored-by: Kyurae Kim <[email protected]>

arnauqb · 2024-09-19T13:29:42Z

Thank you for the comments @Red-Portal ! I'm in the process of merging your comments and adding the additional tests.

I'm currently experiencing a strange bug when computing the closed form of the entropy. In particular,

n_dims = 10
μ_x    = 5.0
σ_x    = 0.3
μ_y    = Fill(5.0, n_dims)
σ_y    = Fill(0.3, n_dims)
model  = NormalLogNormal(μ_x, σ_x, μ_y, Diagonal(σ_y.^2))

d = LogDensityProblems.dimension(model)
q = variational_standard_mvnormal(Float64, d, :meanfield) 
entropy(q) # works

b             = Bijectors.bijector(model)
binv          = inverse(b)
q_transformed = Bijectors.TransformedDistribution(q, binv)
entropy(q_transformed) # does not work

the call entropy(q_transformed) errors with the message MethodError: no method matching iterate(::MultivariateTransformed{MvLocationScale{…}, Stacked{…}})

This should also make the ReprGradElbo objective fail when computing the closed entropy, but somehow it doesn't. Any help?

Red-Portal · 2024-09-20T03:40:28Z

@arnauqb The entropy under nonlinear transformations is generally intractable, so it's not implemented. For the repgradelbo case, for example, the jacobian adjustment part of the entropy is inside the Monte Carlo estimator. Now that I think of it, to make things as clean as possible, it would be best to call reparam_with_entropy since it deals with the Jacobian adjustment when computing the entropy (see the AdvancedVIBijectorsExt extension).

arnauqb · 2024-09-20T11:00:45Z

@Red-Portal Ok, so I had to rework this a bit more, let me know what you think.

For the score estimator, we need to sample x ~ p and then evaluate logp(x). The problem is that the sampling needs to be decoupled from the computation graph. In the current implementation where the sampling and the entropy are done together this was not possible, so I introduced a new function sample_from_q that dispatches according to the objective. The Repr samples from q and the ScoreGrad samples from q_stop.

Locally, I get segmentation faults for the Enzyme tests, even for the ReprGradELBO. Not sure what's wrong there...

Red-Portal · 2024-09-21T01:58:38Z

@arnauqb Can't you just shove q_stop into reparam_with_entropy?

arnauqb added 4 commits July 19, 2024 09:47

implemented stop gradient for forwarddiff and zygote

6d2184c

tests for stop_gradient pass

3da52b5

added tests

6fa7e5d

scoreelbo correctly normalized now

d317c9d

arnauqb mentioned this pull request Jul 19, 2024

Using the score function estimator as gradient for VI TuringLang/Turing.jl#2287

Open

remove unnecessary comment

4ddb758

sunxd3 requested a review from Red-Portal July 19, 2024 13:45

renamed scoreelbo to scoregradelbo for consistency

4313ad4

yebai force-pushed the master branch from 40fd15b to 48fc01d Compare August 9, 2024 15:05

arnauqb added 2 commits September 16, 2024 09:48

imlemented score estimator with baseline variance reduction.

551f7e0

deleted ignore_gradient implementation since no longer needed.

766684f

remove leftovers from merging

191e534

Red-Portal requested changes Sep 18, 2024

View reviewed changes

Red-Portal reviewed Sep 18, 2024

View reviewed changes

test/Project.toml Outdated Show resolved Hide resolved

test/Project.toml Outdated Show resolved Hide resolved

arnauqb and others added 2 commits September 19, 2024 10:52

Apply suggestions from code review

d5fcbd7

Co-authored-by: Kyurae Kim <[email protected]>

reverted change in zygoteext

449f17f

arnauqb added 4 commits September 20, 2024 08:27

reparm with entropy also for the score case

0db0f46

adapting scoregrad tests...

5805dc5

Created dedicated function to sample for score and repr grad objectives.

79a5b13

Enzyme tests segfault

249470f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ScoreELBO objective #72

Add ScoreELBO objective #72

arnauqb commented Jul 19, 2024

Red-Portal commented Jul 20, 2024 •

edited

Loading

sunxd3 commented Jul 20, 2024

arnauqb commented Jul 20, 2024

Red-Portal commented Jul 21, 2024 •

edited

Loading

sunxd3 commented Jul 22, 2024

arnauqb commented Jul 23, 2024

Red-Portal commented Jul 29, 2024 •

edited

Loading

arnauqb commented Sep 16, 2024

Red-Portal left a comment

Red-Portal Sep 18, 2024

arnauqb Sep 19, 2024

arnauqb commented Sep 19, 2024

Red-Portal commented Sep 20, 2024 •

edited

Loading

arnauqb commented Sep 20, 2024

Red-Portal commented Sep 21, 2024

Add ScoreELBO objective #72

Are you sure you want to change the base?

Add ScoreELBO objective #72

Conversation

arnauqb commented Jul 19, 2024

Red-Portal commented Jul 20, 2024 • edited Loading

sunxd3 commented Jul 20, 2024

arnauqb commented Jul 20, 2024

Red-Portal commented Jul 21, 2024 • edited Loading

sunxd3 commented Jul 22, 2024

arnauqb commented Jul 23, 2024

Red-Portal commented Jul 29, 2024 • edited Loading

arnauqb commented Sep 16, 2024

Red-Portal left a comment

Choose a reason for hiding this comment

Red-Portal Sep 18, 2024

Choose a reason for hiding this comment

arnauqb Sep 19, 2024

Choose a reason for hiding this comment

arnauqb commented Sep 19, 2024

Red-Portal commented Sep 20, 2024 • edited Loading

arnauqb commented Sep 20, 2024

Red-Portal commented Sep 21, 2024

Red-Portal commented Jul 20, 2024 •

edited

Loading

Red-Portal commented Jul 21, 2024 •

edited

Loading

Red-Portal commented Jul 29, 2024 •

edited

Loading

Red-Portal commented Sep 20, 2024 •

edited

Loading