disagree between preprocessor and decompiler/pretokenizer #16

kotee4ko · 2023-12-23T18:45:58Z

Hello.

Can somebody, please, explain what is expected behaviour of the next steps:

Decompilers code pass in debug arg only pairs of (loc: var) for the symbolized version of function, not the pseudocode.

Then in preprocessors we performs filtering by output of pseudocode for both stripped and debug versions, resulting in critically large value loss, and nonpredictable rename model.

So, should be both pseudocode versions been involved, or not?
If not - would it be correct to perform filterring by keys() comparsion?

Thanks.

@huzecong @pcyin @clegoues @bvasiles @sophieball @qibinc

qibinc · 2024-01-15T05:33:37Z

DIRTY only handles variable renaming and retyping, and recovering the code tokens for target side (debug version) is not part of the task.

Therefore, the only information we need from the target side to construct the learning task is its variables. The rest of the information in the debug version including other code tokens can't be used since they're not available during inference.

In preprocessing, for each variable in the source side (striped version), we try to match its target (corresponding variable in debug version).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disagree between preprocessor and decompiler/pretokenizer #16

disagree between preprocessor and decompiler/pretokenizer #16

kotee4ko commented Dec 23, 2023 •

edited

Loading

qibinc commented Jan 15, 2024

disagree between preprocessor and decompiler/pretokenizer #16

disagree between preprocessor and decompiler/pretokenizer #16

Comments

kotee4ko commented Dec 23, 2023 • edited Loading

qibinc commented Jan 15, 2024

kotee4ko commented Dec 23, 2023 •

edited

Loading