Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry about evaluation process #1

Open
Alex-HaochenLi opened this issue Mar 22, 2024 · 3 comments
Open

Inquiry about evaluation process #1

Alex-HaochenLi opened this issue Mar 22, 2024 · 3 comments

Comments

@Alex-HaochenLi
Copy link

Hello, I am very excited to read CodeChain. However, I have a question about the evaluation process in this repo.

It seems that during evaluation you test the code on the test cases from test_example_tests.pkl, but not the private test cases from APPS.

Are the pass@1 results reported on the paper based on private test cases? Thank you for your clarification.

@huanhuan6666
Copy link

@Alex-HaochenLi

Same question. Have you received clarification?

Hello, I am very excited to read CodeChain. However, I have a question about the evaluation process in this repo.

It seems that during evaluation you test the code on the test cases from test_example_tests.pkl, but not the private test cases from APPS.

Are the pass@1 results reported on the paper based on private test cases? Thank you for your clarification.

@Alex-HaochenLi
Copy link
Author

@Alex-HaochenLi

Same question. Have you received clarification?

Hello, I am very excited to read CodeChain. However, I have a question about the evaluation process in this repo.
It seems that during evaluation you test the code on the test cases from test_example_tests.pkl, but not the private test cases from APPS.
Are the pass@1 results reported on the paper based on private test cases? Thank you for your clarification.

Not yet :)

@huanhuan6666
Copy link

huanhuan6666 commented Mar 28, 2024

@Alex-HaochenLi
Thank you. In fact, when I was inspecting the evaluation_codechain.sh file, I noticed the following:

# Test by hidden test cases 
python src/evaluate.py --save_gen_path $output_path --eval_split $split

In src/evaluate.py, when example_test_path is not specified, it doesn't load the {split}_example_tests.pkl file. Instead, it ultimately enters utils_evaluate.py and uses example['input_output'] as the test case. In the test set of questions in codeparrot/apps, input_output represents private test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants