Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Transient Execution Weaknesses #5

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

scottconstable
Copy link
Collaborator

Please don't merge this PR right away! We can use the PR itself to collect feedback and address issues, without creating lots of commits and other traffic on the main repo.

Fixed formatting issues that emerged after the docx->md translation
@scottconstable
Copy link
Collaborator Author

scottconstable commented Feb 10, 2023

Some instructions to interact with this PR:

  • I think it is fine to use the "Conversation" tab to discuss minor issues and enhancements. If there are big-picture issues that may require a lot of discussion, it would be better to open a new issue and tag this PR.
  • To view the "pretty" formatted document, click on the PR's "Files changed" tab, then find and click the three horizontal dots to the top right of the raw file, then click "View File."
    Screenshot 2023-02-10 154623
  • To add comments to the document, click on the PR's "Files changed" tab, then click a plus ("+") sign to add a comment on a line. When you're done, click "Finish your review" at the top right and add a message that summarizes your feedback.
    Screenshot 2023-02-10 160222

Copy link
Collaborator Author

@scottconstable scottconstable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied Gananand's and Steve's comments into this GitHub PR.

New CWE Proposals
--------------------------------

### CWE-A: Processor Event Causes Transient Execution
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gananand: Should CWE-A be moved to the bottom?

example, the attacker may be able to infer program data that was
accessed or used by those operations.

### CWE-B: Transient Data Forwarding from an Operation that Triggers a Processor Event
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gananand: Based on the description, perhaps this should be “Incorrect forwarding of transient data”? Or “Incorrect forwarding of transient data that is observable after an architectural state commit”?

attacker to infer program data, such as the incorrect data forwarded by
the operation that triggered the assist.

### CWE-C: Transient Execution Influenced by Shared Microarchitectural Predictor State
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gananand: This one seems very close to the issue already. Perhaps this could be “Improper Isolation of Hardware Domains through Shared Micro-architectural State and Transient Execution”. How would this differ from CWE-1189: Improper Isolation of Shared Resources on a System-On-Chip?

sharing across domain transitions, these features may be always-on, on
by default, or may require opt-in from software.

### CWE-D: Microarchitectural Predictor Causes Transient Execution
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gananand: This sounds like “Incorrectly implemented microarchitectural predictors lead to incorrect transient execution after misprediction.” What is the weakness here? Is it that after misprediction, the transient state is not cleaned up or is that transient state is not shutdown/released etc.? How is this different from CWE-C? Is it related to inferrability/observability of the transient state? What would be the mitigation here? Steven M Christey suggested perhaps this is “improper isolation of code/data of multiple users to separate hardware domains”

sharing across domain transitions, these features may be always-on, on
by default, or may require opt-in from software.

### CWE-D: Microarchitectural Predictor Causes Transient Execution
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steven: The key issue here seems to be that the operations "affect observable microarchitectural state in a manner that could allow an attacker to infer program data" (said in the ext desc). If that's the key point, then it should be emphasized in the main desc

New CWE Proposals
--------------------------------

### CWE-A: Processor Event Causes Transient Execution
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gananand: Based on the description, perhaps this is “Enabling of optimizations during sensitive operations that can lead to observable side effects from transient execution due to processor events”? Potentially multiple weaknesses here, still not clear what the weakness is, attack focused, focused on optimization of out-of-order processor. Seems to be observable discrepancies related. What is the perspective of the weakness? Hardware designer, software user or hardware implementor? What would be the mitigation to these weaknesses? Could it be to turn off these optimizations or maybe it is a parameter for an implementor or user to use?

New CWE Proposals
--------------------------------

### CWE-A: Processor Event Causes Transient Execution
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steven: I'm confused about why this is a concern. Isn't transient execution a normal, expected behavior? So, being able to cause it doesn't seem like an issue. Is the key issue about "allowing transient execution with microarchitectural side effects that can be observed by an adversary"?

“attacker” to align with the [CWE
glossary](https://cwe.mitre.org/documents/glossary/).

- “Transient,” “transient execution,” “transient operations,” etc.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gananand: We can do this.

“attacker” to align with the [CWE
glossary](https://cwe.mitre.org/documents/glossary/).

- “Transient,” “transient execution,” “transient operations,” etc.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steven: We can include a definition in the glossary but we might also want to explain it very briefly in a single sentence in the extended desc - something like the first sentence of the last paragraph of CWE-A.

commit to architectural state” many times. Perhaps MITRE should
consider adding “transient” to its CWE glossary.

- I make liberal use of the term “processor event,” which is
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gananand: Should we perhaps make a list of all processor events?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steven: If "all" events is too large, it would probably be good to list a few common/popular ones.

@scottconstable
Copy link
Collaborator Author

scottconstable commented Feb 11, 2023

Additional feedback from David:

  • To me, CWE-C/D both involve predictors but have a key difference in whether the state is shared across SW or HW domains. That’s an important distinction (perhaps worth calling out in their description) and I also I agree with you that the typical mitigations for the two are different.
  • CWE-B does not involve a predictor, but does involve leakage across HW domains. I wonder if CWE-A could be reworked to refer issues that do not involve a predictor but also only leak values across SW domains. With all of those covered, I’m not sure we’d need a catch-all.

@g-kini
Copy link
Collaborator

g-kini commented Feb 13, 2023

Thanks Scott for adding all of our feedback and for the instructions!

…iented language. Specifically:

- CWE-B describes the condition where transient operations are allowed to access and operate on data in a shared microarchitectural structure
- CWE-C describes the condition where a hardware exception causes incorrect/stale data to be forwarded to dependent transient operations
- CWE-D is only a renaming of CWE-C in the previous proposal. CWE-D describes the condition of sharing microarchitectural predictor state
- CWE-E is only a renaming of CWE-D in the previous proposal. CWE-E describes the condition of a microarchitectural predictor causing transient execution
- CWE-A is a catch-all for transient execution, and would be a parent of CWE-[B-E]. Since CWE-B and CWE-C have been refined into specific conditions, I saw no way to avoid introducing a catch-all.
@scottconstable
Copy link
Collaborator Author

Here is a summary of the updated PR:

  • CWE-A: Transient Execution. Description: A processor event or prediction may allow incorrect operations (or correct operations with incorrect data) to execute transiently, potentially exposing data over a microarchitectural covert channel.
  • CWE-B: Transient Execution Allows Access to Data in a Shared Microarchitectural Structure. Description: A processor event (for example, a hardware exception) may allow transient operations to access another user's data in a shared microarchitectural structure (for example, a CPU cache), potentially exposing the data.
  • CWE-C: Processor Event Causes Incorrect Data to be Forwarded to Operations that Execute Transiently. Description: A processor event (for example, a hardware exception) may allow transient operations to forward incorrect or stale data to dependent operations, potentially exposing the data.
  • CWE-D: Transient Execution Influenced by Shared Microarchitectural Predictor State.
    Description: Shared microarchitectural predictor state may allow code to influence transient execution across a hardware boundary, potentially exposing data that is accessible beyond the boundary.
  • CWE-E: Transient Execution Caused by Microarchitectural Predictor. Description:
    Microarchitectural predictors may allow operations to execute transiently after a misprediction, potentially exposing data.

@BobH-MITRE
Copy link
Collaborator

BobH-MITRE commented Aug 25, 2023

All:
Is this description true for all known transient execution vulnerabilities? If so, I think this is a great start to create a Class CWE that we can organize the other CWEs under.

I have these suggestions for titles:
CWE-A: Exposure of Sensitive Information after Transient Execution
CWE-B: Exposure of Sensitive Information Through Microarchitectural Structures After Transient Execution
CWE-C: Exposure of Sensitive Information Through Incorrect Data Forwarding During Transient Execution
I could see these being organized under CWE-200 somehow (TBD).

Scott:
I spent some time looking at CWE-D and CWE-E. A - C are about information exposure while D and E seem to be about an adversary manipulating or influencing the mico-arch predictors to cause a transient execution. This is not weakness-focused but attacker-focused. The attacker is trying to get a certain type of transient execution to occur. Are these two CWEs the ones where you are trying to handle the cases for cross-domain boundaries and same address space? This perspective is one of technical impact or consequence. Can you describe the mechanism that enables data to be inferable across HW Defined boundaries (cross-domain) and within the same address space?

@scottconstable
Copy link
Collaborator Author

scottconstable commented Aug 25, 2023

Hi @BobH-MITRE ,

Thank you for the feedback! I have made some tweaks to the prepositions your proposed titles:
CWE-A: Exposure of Sensitive Information during Transient Execution
CWE-B: Exposure of Sensitive Information in Shared Microarchitectural Structures during Transient Execution
CWE-C: Exposure of Sensitive Information caused by Incorrect Data Forwarding during Transient Execution

In all cases, I believe that the exposure happens during transient execution when microarchitectural state is altered in a manner that corresponds to sensitive data. The sensitive data may later be recovered (or inferred) via a covert channel analysis technique. I also added the critical word "Shared" to CWE-B. I changed "through" to "caused by" in CWE-C because the incorrect data forwarding may not directly expose sensitive information--it could also be the case that the incorrect forwarded data is malicious data injected by the attacker, for example, to inject a pointer value that will be used to access sensitive information. I think that the "caused by" language generalizes the title to cover both of these scenarios.

I agree that CWE-200 seems like a good parent candidate! But I also admit that I am not nearly familiar enough with the CWE landscape to ascertain that CWE-200 would be the best choice.

I do not understand the critique about CWE-D and CWE-E. In CWE-D the weakness is shared microarchitectural predictor state, which, in accordance with the CWE definition, "could contribute to the introduction of vulnerabilities." In CWE-E the weakness is having a microarchitectural predictor that can cause transient execution, which can also contribute to the introduction of vulnerabilities.

You asked, "Are these two CWEs the ones where you are trying to handle the cases for cross-domain boundaries and same address space?" CWE-D and CWE-E delineate between predictor-based vulnerabilities that arise from predictor state shared across domain boundaries, versus vulnerabilities that arise from abuse of a predictor within a domain boundary. CWE-B and CWE-C are intended to delineate between non-predictor-based vulnerabilities that expose data across a domain boundary, versus those that expose data within a domain boundary (though perhaps that exposed data can later be recovered by another domain). CWE-A can cover other idiosyncratic vulnerabilities such as Speculative Code Store Bypass (CVE-2021-0089) that share little in common with other transient execution vulnerabilities.

@BobH-MITRE
Copy link
Collaborator

I can go along with the title tweaks for CWE-A through CWE-C. I'll leave it to the through group if it is during, before, or after transient execution.

For CWE-D and CWE-E, I think we are on the right track with these after your explanation, but maybe we need to tweak the lens a bit. You wrote, "In CWE-E the weakness is having a microarchitectural predictor that can cause transient execution, which can also contribute to the introduction of vulnerabilities." Isn't that normal behavior? I thought the issue was that the attacker has the ability to influence the predictor so they can cause transient execution when convenient. If my understanding here is correct, then my follow up question would be, "what is the mechanism that is in place that allows an attacker to influence predictor state?"

@scottconstable
Copy link
Collaborator Author

For CWE-D and CWE-E, I think we are on the right track with these after your explanation, but maybe we need to tweak the lens a bit. You wrote, "In CWE-E the weakness is having a microarchitectural predictor that can cause transient execution, which can also contribute to the introduction of vulnerabilities." Isn't that normal behavior?

Having a microarchitectural predictor is a normal condition that leads to normal behavior. My understanding of the definition of "weakness" is that even a normal, acceptable condition can also contribute to the introduction of vulnerabilities. In the real world we have to live with these conditions while acknowledging and understanding their implications.

I thought the issue was that the attacker has the ability to influence the predictor so they can cause transient execution when convenient. If my understanding here is correct, then my follow up question would be, "what is the mechanism that is in place that allows an attacker to influence predictor state?"

This comment certainly applies to CWE-D, where the weakness is that a hardware condition (shared predictor state) allows malicious software to influence the transient execution behavior of other software on the same system. With CWE-E the point is more subtle: if predictor state is not shared (or if there isn't any predictor state and a predictor is "static") then the predictor can contribute to vulnerabilities (and thus is a weakness), but software must also be a co-participant. This can happen in many different ways, but here are a couple practical examples:

  • Managed runtime software using sandboxing techniques to attempt to isolate mutually distrusting sub-processes within a single address space. A malicious sub-process trains a branch predictor to predict "in-bounds" and then makes an out-of-bounds access that transiently reads a victim sub-process's sensitive data.
  • Server software processes untrusted inbound data from a network. A malicious adversary on the network crafts network packets to train predictor(s) on the server (e.g., to predict that packet contents are valid), then launches an attack with an invalid packet that transiently causes the server to access and expose sensitive data.

@BobH-MITRE
Copy link
Collaborator

For CWE-D and CWE-E, I think we are on the right track with these after your explanation, but maybe we need to tweak the lens a bit. You wrote, "In CWE-E the weakness is having a microarchitectural predictor that can cause transient execution, which can also contribute to the introduction of vulnerabilities." Isn't that normal behavior?

Having a microarchitectural predictor is a normal condition that leads to normal behavior. My understanding of the definition of "weakness" is that even a normal, acceptable condition can also contribute to the introduction of vulnerabilities. In the real world we have to live with these conditions while acknowledging and understanding their implications.

I thought the issue was that the attacker has the ability to influence the predictor so they can cause transient execution when convenient. If my understanding here is correct, then my follow up question would be, "what is the mechanism that is in place that allows an attacker to influence predictor state?"

This comment certainly applies to CWE-D, where the weakness is that a hardware condition (shared predictor state) allows malicious software to influence the transient execution behavior of other software on the same system. With CWE-E the point is more subtle: if predictor state is not shared (or if there isn't any predictor state and a predictor is "static") then the predictor can contribute to vulnerabilities (and thus is a weakness), but software must also be a co-participant. This can happen in many different ways, but here are a couple practical examples:

  • Managed runtime software using sandboxing techniques to attempt to isolate mutually distrusting sub-processes within a single address space. A malicious sub-process trains a branch predictor to predict "in-bounds" and then makes an out-of-bounds access that transiently reads a victim sub-process's sensitive data.
  • Server software processes untrusted inbound data from a network. A malicious adversary on the network crafts network packets to train predictor(s) on the server (e.g., to predict that packet contents are valid), then launches an attack with an invalid packet that transiently causes the server to access and expose sensitive data.

Can you try to incorporate some of the nuances here into the titles for CWE-D and CWE-E? As the titles stand now, they seem to describe normal behavior. I spent some time trying to come up with a suggestion, but I don't quite grasp the nuances of the space.

- We removed a CWE that applied exclusively to predictor-based transient execution not involving shared predictor state. We believe that CWE-A suffices to cover these cases.
- Some of that CWE's extended description has been updated and merged into CWE-A.
- There is a placeholder CWE-E that will cover "speculation oracle" weaknesses such as Pacman.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants