Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disagree between preprocessor and decompiler/pretokenizer #16

Open
kotee4ko opened this issue Dec 23, 2023 · 1 comment
Open

disagree between preprocessor and decompiler/pretokenizer #16

kotee4ko opened this issue Dec 23, 2023 · 1 comment

Comments

@kotee4ko
Copy link

kotee4ko commented Dec 23, 2023

Hello.

Can somebody, please, explain what is expected behaviour of the next steps:

Decompilers code pass in debug arg only pairs of (loc: var) for the symbolized version of function, not the pseudocode.

Then in preprocessors we performs filtering by output of pseudocode for both stripped and debug versions, resulting in critically large value loss, and nonpredictable rename model.

So, should be both pseudocode versions been involved, or not?
If not - would it be correct to perform filterring by keys() comparsion?

Thanks.

@huzecong @pcyin @clegoues @bvasiles @sophieball @qibinc

@qibinc
Copy link
Collaborator

qibinc commented Jan 15, 2024

DIRTY only handles variable renaming and retyping, and recovering the code tokens for target side (debug version) is not part of the task.

Therefore, the only information we need from the target side to construct the learning task is its variables. The rest of the information in the debug version including other code tokens can't be used since they're not available during inference.

In preprocessing, for each variable in the source side (striped version), we try to match its target (corresponding variable in debug version).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants