Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification: security properties of exclusion-instead-of-inclusion byte-range lists #40

Open
jayaddison opened this issue Oct 1, 2023 · 3 comments

Comments

@jayaddison
Copy link

Hi - I have a question about heading 4.1.2 of the C2PA v1.3 specification:

The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm over some or all of the bytes of an asset as described in the core specification. Traditionally, this type of binding is done over an inclusive list of byte ranges of the asset. However, a number of attacks on an inclusion list-based approach were identified and it was determined that they are prevented by the use of exclusions lists. These vulnerabilities would have allowed content to be added to an asset that altered the digital content without altering the hard bindings.

Would it be possible to share additional information about the types of attack that inclusion lists were found to be vulnerable to, and how exclusion lists defend against these?

Thank you,
James

@lrosenthol
Copy link
Contributor

@jayaddison I don't see why not - will add to our internal issues tracker to address!

@jayaddison
Copy link
Author

Thank you, @lrosenthol!

@jayaddison
Copy link
Author

jayaddison commented Aug 13, 2024

Before responding further, a recap on since-published versions:

v1.3 guidance:

<p>The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm over some or all of the bytes of an asset as described in <a href="../specs/C2PA_Specification.html#_hashing" class="xref page">the core specification</a>. Traditionally, this type of binding is done over an inclusive list of byte ranges of the asset. However, a number of attacks on an inclusion list-based approach were identified and it was determined that they are prevented by the use of exclusions lists. These vulnerabilities would have allowed content to be added to an asset that altered the digital content without altering the hard bindings.</p>

The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm over some or all of the bytes of an asset as described in the core specification. Traditionally, this type of binding is done over an inclusive list of byte ranges of the asset. However, a number of attacks on an inclusion list-based approach were identified and it was determined that they are prevented by the use of exclusions lists. These vulnerabilities would have allowed content to be added to an asset that altered the digital content without altering the hard bindings.

v1.4 guidance (unchanged, apart from the hyperlink):

<p>The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm over some or all of the bytes of an asset as described in <a href="../../2.0/specs/C2PA_Specification.html#_hashing" class="xref page">the core specification</a>. Traditionally, this type of binding is done over an inclusive list of byte ranges of the asset. However, a number of attacks on an inclusion list-based approach were identified and it was determined that they are prevented by the use of exclusions lists. These vulnerabilities would have allowed content to be added to an asset that altered the digital content without altering the hard bindings.</p>

The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm over some or all of the bytes of an asset as described in the core specification. Traditionally, this type of binding is done over an inclusive list of byte ranges of the asset. However, a number of attacks on an inclusion list-based approach were identified and it was determined that they are prevented by the use of exclusions lists. These vulnerabilities would have allowed content to be added to an asset that altered the digital content without altering the hard bindings.

v2.0 guidance (change in terminology):

<p>The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm, as described in <a href="#_hashing">Section 11.3.4.2, &#8220;Hashing&#8221;</a>, over some or all of the bytes of an asset. This approach can be used on any type of asset, but should only be considered for formats that don&#8217;t support one of the forms of box-based hashing.</p>

The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm, as described in Section 11.3.4.2, “Hashing”, over some or all of the bytes of an asset. This approach can be used on any type of asset, but should only be considered for formats that don’t support one of the forms of box-based hashing.

v2.1 guidance:

<p>The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm, as described in <a href="#_hashing">Section 13.1, &#8220;Hashing&#8221;</a>, over some or all of the bytes of an asset. This approach can be used on any type of asset, but should only be considered for formats that don&#8217;t support one of the forms of box-based hashing.</p>

The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm, as described in Section 13.1, “Hashing”, over some or all of the bytes of an asset. This approach can be used on any type of asset, but should only be considered for formats that don’t support one of the forms of box-based hashing.

As an aside: it seems that version control is potentially being used in an unexpected way in this repository -- generally the authored source materials used to produce documents (and if necessary the resulting output from building those sources) would be committed to source control, with revision history available to browse only the diffs/patches applied for each revision. In this case it seems to me that subsequent versions are being added to source control as separate directories, meaning that common git version control workflow practices cannot easily be used to compare inter-version changes. However, I can understand that there may be organizational and/or process-driven reasons to have made those choices. What matters more is scrutiny of the content.

I haven't heard of the term box hashing before in the context of information security, so will spend some time to learn more about that.

In my experience, hashing the entire content of a file (without using include/exclude ranges) tends to be the preferred approach when using hashing to identify and/or de-duplicate content that may be bit-for-bit identical (with the caveat that hash collisions may be found for any lossy hash given sufficient compute resources).

Edit: use permalinks for all documentation references
Edit 2: fixup for v1.3 documentation reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants