Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaskedLARK & MACAW, FLEDGE (sharing the feature vector with helpers) #28

Open
jonasz opened this issue Aug 4, 2021 · 5 comments
Open

Comments

@jonasz
Copy link

jonasz commented Aug 4, 2021

Hi,

Thank you Charles for the presentation during the recent W3C meeting!

In the Masked Gradient Model Training use case one of the assumptions seems to be that the feature vector can be shared with the helpers.

In my understanding, this solution would not be compatible with MACAW and FLEDGE, where the feature vector is constructed from context and user features ((c, s) in MACAW's notation), and it is important not to share that combined information outside the device.

I was wondering, is my understanding correct? If so, do you see any ways to adapt MaskedLARK to MACAW and FLEDGE?

Best reagards,
Jonasz

@jpfeiffe
Copy link

So, PARAKEET should be fairly easy of course, although operate entirely on C', S'. I spoke offline with MACAW and although the Explainer does state the C, S should not leave the browser service, returning it to the browser itself should be okay. FLEDGE is currently a bit trickier, I think, although I'm not quite sure. It seems that we should get C as part of the reportWin/reportResult, but only the single interest group.

@KeldaAnders
Copy link
Contributor

@jonasz We have added this issue to the discussion for tomorrow's PARAKEET biweekly meeting.

@michaelkleber
Copy link

Hi folks, to follow up on the brief discussion of this from today's call.

@jonasz is quite correct that in the FLEDGE model, we would not want to send the helper servers all the signals in the clear. The signals available for ML model evaluation can include data from two different sites: X = userBiddingSignals from the advertiser site where the user was added to the interest group, and Y = auctionSignals from the publisher site where the ad will appear. We don't want the browser to reveal to the helper servers information from X and Y joined together.

From my understanding of MaskedLARK, the solution to this problem is hiding in the line "Browsers are encouraged to use masking to generate both fake keys and values". Here the true key is <X,Y>, but the browser could generate fake keys of the form <X, Y'> or <X', Y>, where X' and Y' are the comparable user or auction signals from other users. These fake keys, of course, would be masked out as described in the proposal.

This does require some sort of "shuffling" infrastructure, so that one browser can learn another browser's signals in order to send them as noise. That's not my favorite thing in the world, but it seems plausible.

I'm not sure how much fake (and therefore masked) data this would require, though. Surely it would be more than the "ideally, no more than two evaluations per event" that you can get if you're only masking the convert-or-not bit. So perhaps this is more expensive than is palatable.

@mehulparsana
Copy link
Contributor

@michaelkleber data required to generate X' and/or Y' on browser should be aggregated histograms (or could be compressed data structure such as Count Min Sketches). Alternative to this is secret sharing <X,Y> to helpers, this is plausible but computationally and network expensive for gradient computation. Any thoughts?

@michaelkleber
Copy link

Secret-sharing <X,Y> to the helpers of course would solve everything! But I didn't think that was the MaskedLARK proposal's approach to things. Sorry if I misunderstood the scope of this idea.

How would you use aggregate data to create the fake keys in which to bury the true <X,Y> ? Not opposed to the idea at all, I just don't see how to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants