Fetch: CORB #81

annevk · 2018-04-10T12:06:15Z

CORB (cross-origin resource blocking) is a proposal by Google to disallow more responses to cross-origin "no-cors" requests: whatwg/fetch#681.

Given Spectre and issues with "no-cors" in general this seems worth doing to me, but I'd like to double check.

If we want to mark these smaller proposals, I'd suggest worth-prototyping.

cc @ckerschb

The text was updated successfully, but these errors were encountered:

martinthomson · 2018-04-11T01:19:52Z

I think that sounds right, but damn, that explainer is amazingly obtuse. It fails to explain spectacularly. It says why, then the benefits of it, but you have to read a lot of a very large document to understand a) what principles it applies and b) how it applies them.

ekr · 2018-04-11T05:09:49Z

I'm trying to put my finger on what makes me uncomfortable about this, so forgive the fact that this is going to ramble a bit.

As I understand it, the idea with this proposal is to reject responses which potentially contain sensitive data in the parent process (where the content type is resolved) rather than in the child process, where it would be exposed to spectre-type attacks. Assuming that's correct, it seems like we are defining a Web platform feature that is really tied to specific process architecture, which in turn is designed to match a specific processor architecture. So, I think we ought to ask whether that's really a good thing to bake into the Web architecture,

annevk · 2018-04-11T07:23:28Z

Being stricter on Content-Type, even if just for "no-cors", seems like a net positive. And in general being stricter with "no-cors" seems like a positive, since that's often a source of vulnerabilities that apply regardless of architecture.

I also wonder if different architectures might not benefit as well from being able to terminate the connection/response early on.

The other thing I would take into consideration here is that we're considering expanding Origin or adding From-Origin (see #80) for similar reasons and CORB would obviate the need for specifying those to some extent, thereby protecting more by default.

hsivonen · 2018-04-11T09:07:14Z

Being stricter on Content-Type, even if just for "no-cors", seems like a net positive. And in general being stricter with "no-cors" seems like a positive, since that's often a source of vulnerabilities that apply regardless of architecture.

This seems right, so my preliminary view of the general CORB concept is positive.

However, I'm worried about the details.

In particular, I'm worried about using mimesniff for a purpose that it wasn't designed for. It was not designed to confirm types all the formats that it can second guess resources as. Worse, the explainer suggests unfamiliarity with these details. In particular, the explainer says: "Confirming an XML content-type via sniffing is more straightforward than JSON or HTML: XML is signified by the pattern <?xml, possibly preceded by whitespace." However, this is incorrect. It is both too broad and too narrow. Any resource where "<?xml" is preceded by whitespace fails to parse as XML. More importantly, XML doesn't require <?xml, so CORB would fail to protect XML that doesn't have an XML declaration. (It's unclear to me if the letting <?xml be preceded by whitespace in mimesniff makes sense in the light of the purpose that mimesniff was designed for or if it's a plain spec bug in mimesniff.)

It's a bit unclear to me why the whole thing isn't reversed. Why is CORB focused on finding e.g. markup traits in loads initiated for images instead of finding image format magic numbers?

Having removed previous confusing packet boundary-dependent behavior from Gecko, I'm especially not a fan of "CORB MAY limit sniffing to the first few network packets."

So, I'd like to see a proper spec for this and I'd this not to depend on packet boundaries or timing, i.e. I'd want all browsers to look at a deterministic number of bytes, which is easier to accept when looking for the magic numbers of the formats typically mislabeled than when trying to confirm whether text-based formats are correctly labeled)

csreis · 2018-04-14T21:11:31Z

As I understand it, the idea with this proposal is to reject responses which potentially contain sensitive data in the parent process (where the content type is resolved) rather than in the child process, where it would be exposed to spectre-type attacks. Assuming that's correct, it seems like we are defining a Web platform feature that is really tied to specific process architecture, which in turn is designed to match a specific processor architecture. So, I think we ought to ask whether that's really a good thing to bake into the Web architecture,

The Spectre defense is one part of it, but CORB has some benefits regardless of process architecture, as @annevk mentions. For example, it can also help prevent things like XSSI (and similar) attacks where a page tries to poke at a sensitive response to find out what's in it. @arturjanc gave some examples in whatwg/fetch#687 (comment).

In particular, I'm worried about using mimesniff for a purpose that it wasn't designed for. It was not designed to confirm types all the formats that it can second guess resources as.

To clarify, CORB doesn't use traditional mimesniff logic for its confirmation sniffing, since you're right that it's not designed for that purpose. For example, things like <!-- can be at the start of JavaScript files, so we can't use it to confirm a text/html response is actually HTML.

Worse, the explainer suggests unfamiliarity with these details. In particular, the explainer says: "Confirming an XML content-type via sniffing is more straightforward than JSON or HTML: XML is signified by the pattern <?xml, possibly preceded by whitespace." However, this is incorrect. It is both too broad and too narrow. Any resource where "<?xml" is preceded by whitespace fails to parse as XML. More importantly, XML doesn't require <?xml, so CORB would fail to protect XML that doesn't have an XML declaration. (It's unclear to me if the letting <?xml be preceded by whitespace in mimesniff makes sense in the light of the purpose that mimesniff was designed for or if it's a plain spec bug in mimesniff.)

We're happy to refine the details here to fix issues.

It's a bit unclear to me why the whole thing isn't reversed. Why is CORB focused on finding e.g. markup traits in loads initiated for images instead of finding image format magic numbers?

Unfortunately, there isn't really an equivalent to magic numbers for JavaScript responses. JavaScript files can be requested from any origin, and it's quite hard to confirm whether a response contains JavaScript by sniffing it. Instead, we went with blocking confirmable content types that aren't allowed cross-origin (and which were more likely to contain sensitive info): HTML, XML, JSON.

Having removed previous confusing packet boundary-dependent behavior from Gecko, I'm especially not a fan of "CORB MAY limit sniffing to the first few network packets."

So, I'd like to see a proper spec for this and I'd this not to depend on packet boundaries or timing, i.e. I'd want all browsers to look at a deterministic number of bytes, which is easier to accept when looking for the magic numbers of the formats typically mislabeled than when trying to confirm whether text-based formats are correctly labeled)

I think specifying confirmation sniffing probably makes sense, and we're happy to discuss the tradeoffs of how much to sniff. I think our implementation does use a deterministic kMaxBytesToSniff=1024 regardless of packet boundaries, if that's what you mean. @nick-chromium knows the details.

(Just for context, it's theoretically possible to do CORB without confirmation sniffing and only protect responses that are labeled as nosniff or From-Origin, but our goal was to protect as many current resources as possible. We did find that confirmation sniffing was necessary for protecting responses without nosniff: almost 20% of responses would be blocked without it, thanks largely to JavaScript responses labeled as text/html.)

annevk · 2018-05-07T12:23:37Z

Thanks @csreis!

We've had some more discussion about this internally and arrived on worth prototyping as Mozilla's position.

Any implementation will be tracked by https://bugzilla.mozilla.org/show_bug.cgi?id=1459357.

FWIW, the initial set of changes to Fetch does not inspect the response body and is therefore a bit more straightforward: whatwg/fetch#686. Any future change that does need to inspect the response body will be done in such a way that it's fully deterministic.

annevk · 2019-06-24T18:04:05Z

These days I think "CORB++" is a favorable alternative if we can proof it out:

annevk · 2020-04-15T09:47:08Z

As an update, I still stand by my prior comment. https://github.com/annevk/orb is what I think Firefox should pursue prototyping as it's safelist- rather than blocklist-based (i.e., what hsivonen suggested above).

So at this point I think CORB is non-harmful, but I do want to be clear that it's been an important stepping stone to hopefully getting to something better (and there's much overlap in the algorithms). And if for some reason ORB (or CORB++) does not work out it would be a good fallback.

Does that seem reasonable?

Note that I have added this as Proposal as it ought to be removed from Fetch due to its single implementer status. Closes #81.

csreis · 2020-05-20T17:48:34Z

Just curious how you plan to do step 14 of https://github.com/annevk/orb:

If response's body parses as JavaScript and does not parse as JSON, then return true.

I mentioned that confirming something is JavaScript is a bit tricky in #81 (comment), and that's one of the main reasons we went the other way.

Would this require doing JavaScript parsing in the privileged process rather than the sandboxed process? That seems like potentially concerning attack surface.
Would it just parse the first 1024 bytes (as in steps 11-12 for images/media), or does it queue up the entire response body and parse it before making a decision?
JavaScript has a pretty lenient parser, and this approach will probably also have to accept JavaScript files that begin with an HTML comment (possibly longer than 1024 bytes). Will these issues mean that many HTML files will be considered valid JavaScript files, and thus be left unprotected?

If there's a good way to solve these, then ORB / CORB++ seems useful for protecting more types by default.

annevk · 2020-05-20T17:59:48Z

@csreis the idea is to accept the performance hit for responses that do not specify a MIME type (and are opaque) and parse them, but in an oracle process rather than a privileged process.

annevk mentioned this issue May 2, 2018

performance.memory #85

Closed

dbaron added the venue: WHATWG Specifications in a WHATWG Workstream label Aug 9, 2018

annevk added the proposal appears unstable label Nov 19, 2019

annevk added a commit that referenced this issue Apr 28, 2020

Add CORB (non-harmful)

4c7ca74

Note that I have added this as Proposal as it ought to be removed from Fetch due to its single implementer status. Closes #81.

annevk mentioned this issue Apr 28, 2020

Add CORB (non-harmful) #334

Merged

dbaron closed this as completed in #334 May 20, 2020

dbaron pushed a commit that referenced this issue May 20, 2020

Add CORB (non-harmful), and properly escape some HTML (#334)

3e43a19

Note that I have added this as Proposal as it ought to be removed from Fetch due to its single implementer status. Closes #81.

hsinyi mentioned this issue Jul 31, 2023

Opaque Response Blocking #860

Closed

zcorpan added the position: neutral label Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetch: CORB #81

Fetch: CORB #81

annevk commented Apr 10, 2018

martinthomson commented Apr 11, 2018

ekr commented Apr 11, 2018

annevk commented Apr 11, 2018

hsivonen commented Apr 11, 2018 •

edited

Loading

csreis commented Apr 14, 2018

annevk commented May 7, 2018 •

edited

Loading

annevk commented Jun 24, 2019

annevk commented Apr 15, 2020

csreis commented May 20, 2020

annevk commented May 20, 2020

Fetch: CORB #81

Fetch: CORB #81

Comments

annevk commented Apr 10, 2018

martinthomson commented Apr 11, 2018

ekr commented Apr 11, 2018

annevk commented Apr 11, 2018

hsivonen commented Apr 11, 2018 • edited Loading

csreis commented Apr 14, 2018

annevk commented May 7, 2018 • edited Loading

annevk commented Jun 24, 2019

annevk commented Apr 15, 2020

csreis commented May 20, 2020

annevk commented May 20, 2020

hsivonen commented Apr 11, 2018 •

edited

Loading

annevk commented May 7, 2018 •

edited

Loading