Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offer access to non-partitioned storage somehow #102

Closed
johnwilander opened this issue Mar 22, 2022 · 29 comments
Closed

Offer access to non-partitioned storage somehow #102

johnwilander opened this issue Mar 22, 2022 · 29 comments
Labels
future Will consider for a future revision

Comments

@johnwilander
Copy link
Collaborator

Tess wrote in #62 (comment):

Is the right solution here to never switch out the storage mechanism from out under the bare storage API calls, and to instead offer access via some kind of explicit storage bucket api?

Opening this issue to track that discussion since we wanted to close #62.

@johnwilander johnwilander added the future Will consider for a future revision label Mar 22, 2022
@Protek5150iworld

This comment was marked as spam.

@lghall
Copy link

lghall commented Nov 2, 2022

For Google Workspace, we have several use cases of embedded frames in a 3p context that currently rely on requests being served by a same-origin Service Worker. We would appreciate having an API that grants unpartitioned service worker access for embeds in a 3p context.

@annevk
Copy link
Collaborator

annevk commented Nov 2, 2022

We cannot offer service worker access as-is. There would have to be some kind of ceremony due to the security implications.

@johannhof
Copy link
Member

That makes sense, but I think a security ceremony could be as simple as having a separate parameter for opting into SW? (As opposed to all storage). It's a bit weird that SW would still be partitioned but not the rest of storage...

@annevk
Copy link
Collaborator

annevk commented Nov 3, 2022

Maybe, depends on how the buckets API ends up looking and how they integrate with the default bucket.

@johannhof
Copy link
Member

On the Chrome side we're thinking about what this could look like in practice and discussed it with the folks developing the Storage Buckets API (@ayuishii, @evanstade). Two main takeaways:

  • Simply returning a regular Storage Bucket is likely not what we want from SAA. Buckets are primarily driven by the motivation to give developers more control over storage eviction, which results in a few undesirable properties for us, such as the non-support of LocalStorage and expiration semantics that seem unnecessary if we just want to expose the default bucket.
  • However, we don't have to use specifically buckets for this. It's not that hard to create a similar concept, a thin wrapper API that provides a handle to storage with the default (unpartitioned) storage / partition key.

So, instead of using buckets directly, we were thinking of just using the underlying infrastructure and creating a new "CrossSiteStorageHandle" (name TBD) that is similar to a bucket and can be accessed through SAA:

// Request a new storage handle via rSA (this should prompt the user)
let handle = await document.requestStorageAccess({ storageHandle: true });
// Write some cross-site localstorage
handle.localStorage.setItem("userid", "1234");
// Open or create an indexedDB that is shared with the 1P context
let messageDB = handle.indexedDB.open("messages");

Other notes/open questions:

  • Service Workers are excluded from this API for now, as there are known cache-based attacks on them. Are there other APIs this affects? Cache API? We should still think about how we can safely enable developers to opt into SWs through this API.

  • For Quota Storage, should this return a Storage Bucket or directly interface with the relevant APIs, e.g. handle.indexedDB.open vs. handle.bucket.indexedDB.open?

  • What do the arguments to requestStorageAccess look like? Do we even need them or do we always return a handle by default (probably not)? Should developers opt into specific APIs by passing e.g. { localStorage: true }?

@annevk @bvandersloot-mozilla I'd love to get your early thoughts on this

cc @cfredric @wanderview @miketaylr

@johannhof johannhof added the agenda+ Request to add this issue to the agenda of our next telcon or F2F label May 17, 2023
@asutherland
Copy link

asutherland commented May 23, 2023

Storage Buckets API and the Storage Standard

I agree this use case doesn't necessarily make sense for the Storage Buckets API proposal's specific goals, but since the intent of the Storage Buckets API explicitly is to "proposes changes to the Storage Standard", I do think it makes sense to define this exposure via the Storage Standard. And since it's likely the storage bucket changes will be incorporated into the storage standard, I think it makes sense to think in terms of the implications.

In particular, this would simplify specification of things like the interaction with Clear-Site-Data. Currently I think there's some technical debt in that spec as clear-site-data still defines its own clearing logic, but it's missing clearing Cache API storages. In w3c/webappsec-clear-site-data#61 I raised the issue of Clear-Site-Data being able to clear specific Buckets and @annevk proposed having the Storage Standard own the header, which I think would be consistent with also moving clearing into the Storage Standard.

To that end, it could potentially make sense to create a DefaultStorageBucket interface that extends StorageBucket but that is complicated by my next point and leads me to a different proposal...

The Perf implications of LocalStorage means it should be opt-in

Because LocalStorage is a synchronous API, browsers end up having to make trade-offs between having to preload and keep in memory up to 10 MiB of data[1][2] versus potentially blocking their main thread to wait on I/O when LocalStorage is accessed. For page loads it's frequently possible to race the I/O required for this with the underlying page load, which potentially hides some of the costs of this. And the status quo is basically that this is an inescapable cost.

But exposing LocalStorage via this API would potentially introduce a new cost as the unpartitioned LocalStorage might not already be in memory. It would be nice to structure the API so that the cost doesn't have to be paid unless the caller explicitly requests LocalStorage. This would have the benefit of making it more obvious to developers what the costs are and help them consider options like IndexedDB as alternatives. In particular, IndexedDB can be more competitive if it isn't always having to compete with LocalStorage which needs to be prioritized to avoid jank.

To this end, it could make sense for the API to look something like requestStorageAccess({ defaultBucket: true, localStorage: true, sessionStorage: true }) and have that return the StorageBucket for the default bucket as well as LocalStorage and SessionStorage Storage instances. This avoids cluttering up the bucket semantics with the Web Storage API, but the underlying request and result types and algorithm to process the request into the response could still be defined in the Storage Standard, with the Storage Access API still being able to do its own checks before invoking the algorithm.

1: At least Gecko's quota enforcement is 5 * 1024 * 1024 code units across all keys and values. Because JS uses 16-bit code units, this ends up being capable of storing 10 MiB of data.

2: Most sites probably don't intend to store massive amounts of data in LocalStorage, but we've emergently seen it happen in a number of cases where sites end up storing massive amounts of data due to logging or JS caching (from the past) and even hit the limits. For example, we saw people experiencing NYTimes breakage on wordle because of full LocalStorage.

@bvandersloot-mozilla
Copy link
Collaborator

Nit on the name: perhaps Origin is better than Site- this would be the way to access Storage across origins within a Site as well, no?

@johannhof
Copy link
Member

@asutherland this is a really great comment, thank you. I really like your suggestion, this seems like a good way to design things, although I will check back with @wanderview et al. to make sure it also aligns with how Chrome's storage works.

Nit on the name: perhaps Origin is better than Site- this would be the way to access Storage across origins within a Site as well, no?

Not sure I follow, hah. Isn't this referring to across top-level sites? With that said I don't feel strongly about the name :)

@wanderview
Copy link

To this end, it could make sense for the API to look something like requestStorageAccess({ defaultBucket: true, localStorage: true, sessionStorage: true })

It seems there are a couple scenarios here:

  1. Site has localstorage data stored and wants to use it. (Worth paying cost to load it)
  2. Site does not have localstorage data stored and does not want to use it (Cost to load is minimal)
  3. Site has localstorage data stored, but does not plan to use it (We would prefer not to load)

Scenario 1 and 2 seem a lot more likely to me than 3.

Given we already have an async step for requestStorageAccess() that can incorporate user response latency it's unclear we really need to worry about localstorage load times.

I don't really feel strongly, just trying to understand the performance concern.

@annevk annevk changed the title Offer access via explicit storage bucket API Offer access to non-partitioned storage somehow May 26, 2023
@annevk
Copy link
Collaborator

annevk commented May 26, 2023

We need to think through how this interacts with #154. It seems like the initial takes above would force folks to invoke requestStorageAccess() again, which does not seem great.

I think in principle you could allow access to the Cache API, though it's not clear this will be useful as you can only make use of it in conjunction with Service Workers (at this point, anyway). So it doesn't strike me as worth the effort.

I'd personally rather not offer soft-deprecated APIs such as localStorage and sessionStorage at all.

@johannhof
Copy link
Member

We need to think through how this interacts with #154. It seems like the initial takes above would force folks to invoke requestStorageAccess() again, which does not seem great.

Without speaking for @bvandersloot-mozilla, but I think we have both expressed in the thread that we don't think that the implicit storage access after navigation model is something we want to design rSA for. Independent of that, I'm not sure why that doesn't seem great to you? I think it's a good idea to preserve access to the partitioned storage by default and expose the default bucket via a separate function. This way there's no hidden or lost storage at any time.

I think in principle you could allow access to the Cache API, though it's not clear this will be useful as you can only make use of it in conjunction with Service Workers (at this point, anyway). So it doesn't strike me as worth the effort.

Right, I also don't know of any use case for it so that seems fine.

I'd personally rather not offer soft-deprecated APIs such as localStorage and sessionStorage at all.

I'm not sure I want to discuss the merits and deprecation state of localStorage / sessionStorage in SAA. Whenever they get truly deprecated/removed (which given current usage and lack of alternatives seems like ~never) we can follow that.

@asutherland
Copy link

In reply to @wanderview #102 (comment)

Given we already have an async step for requestStorageAccess() that can incorporate user response latency it's unclear we really need to worry about localstorage load times.

Although the latency is a concern, my greater concern is the opportunity cost of potentially wasted I/O bandwidth for multiple megabytes of data[1] and memory consumed until the iframe is destroyed, especially for users with spinning disks[2]. I agree the iframe caller of the API may care less about the costs than the browser user.

Additionally, if there's no way for a site to opt-out of the cost of loading LocalStorage, it will likely be harder to convince sites to move to IndexedDB incrementally because they would need to stop using LocalStorage entirely across their entire site in order to avoid paying the cost in a requestStorageAccess that provides no choice about whether to load LocalStorage.

1: To expand on my previous post slightly, I don't think that most sites intentionally try and store multiple megabytes of data; in my own profile the heavier origins are in the 0.5-1.5MiB range, although there are a number of origins at the 5MiB limit. But I think it's easy to use a third-party lib or introduce some temporary code that uses localStorage where even if the code is removed the storage use may remain until the bucket is cleared.

2: My impression is that Firefox still has a number of users with spinning disks, so especially for these users. Although the Firefox Hardware Report unfortunately does not have data specifically on that, our telemetry data with time-based units generally convey the impression that there are a significant number of users without SSDs.

@asutherland
Copy link

One additional spec complication if LocalStorage and/or SessionStorage are exposed is the "storage" event. Since StorageEvent explicitly exposes a Storage? storageArea the event if dispatched on the global would not technically be ambiguous, but would really require that requestStorageAccess always return the ~SameObject each time it resolves, although that would have to happen in spec language rather than the WebIDL since SameObject is only for attributes and there's indirection via a promise here. I don't think firing the event anywhere else would be an improvement.

@johannhof
Copy link
Member

We've interestingly gotten two separate requests for Shared Workers support in the last two days, though I don't think they're connected? (see issues above)

It seems to me like a SharedWorker constructor seeded with the right Storage Key / Principal is a thing that could just be exposed on the (TBD) CrossSiteStorageHandle, but I'm curious if there are concerns from this group.

@wanderview
Copy link

I think we should expose everything we partition with the exception of ServiceWorker (since the FetchEvent handling is just too weird). So I would also add BroadcastChannel.

@jcubic
Copy link

jcubic commented Nov 17, 2023

Are there any updates on this? Any future plans or proposals?

This is something that users of my library sysend.js need. There are no way of communication between non-related domains using my library anymore.

My library use BroadcastChannel or localStorage (storage event) fallback to send message to iframe in order to send message to different domain. This is perfectly legit, since the domain that accepts the event needs to have a proxy iframe.

@arichiv
Copy link

arichiv commented Dec 4, 2023

An update on the Origin Trial for localStorage access is here: https://developer.chrome.com/blog/saa-non-cookie-storage/

BroadcastChannel is available in the same Origin Trial with the Chrome 121 beta that should be promoted later this week https://chromiumdash.appspot.com/schedule.

@jcubic

This comment was marked as off-topic.

@arichiv

This comment was marked as off-topic.

@miketaylr

This comment was marked as off-topic.

@jcubic

This comment was marked as off-topic.

@miketaylr

This comment was marked as off-topic.

@egor-limenko
Copy link

Hello, are there any updates on this topic? We have some communications mechanisms broken and as storage partitioning seems to be rolling out for more and more customers, it becomes bigger problem.
Could we have some mechanism to let user decide if he can provide consent to use unpartitiong storage somehow?
It was provided for Third-party cookies via "requestStorageAccess" and it seems @johannhof was proposing to expand it for localStorage too

@arichiv
Copy link

arichiv commented Feb 20, 2024

Yes, there's a blog post about an origin trial for exposing other storage mechanisms via requestStorageAccess here: https://developers.google.com/privacy-sandbox/blog/saa-non-cookie-storage

Only DOM Storage (session and local storage), Indexed DB, and Web Locks are available in Chrome 120.

Cache Storage, Origin Private File System, Quota, Blob Storage, and Broadcast Channel were added in Chrome 121.

Shared Workers are coming in M123.

@egor-limenko
Copy link

@arichiv Thank you! We are going to try that approach

@egor-limenko
Copy link

@arichiv Could you suggest when this is going to become available in browser by default? Per my understanding we need to integrate deprecation trial token in the hosting site (unlike with 3PC deprecation trial where it was integrated in the embedded iframe), and if we are actually not owning this, we would have to ask top level site owners to opt-in the trial themselves?

@arichiv
Copy link

arichiv commented Feb 20, 2024

You need to include the token for the SAA extension into the iframe that wants to use it.

@johannhof
Copy link
Member

Going to close this one since there's a separate proposal now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
future Will consider for a future revision
Projects
None yet
Development

No branches or pull requests