Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add new models to the leaderboard? #25

Open
chujiezheng opened this issue Jun 1, 2024 · 2 comments
Open

How to add new models to the leaderboard? #25

chujiezheng opened this issue Jun 1, 2024 · 2 comments
Assignees

Comments

@chujiezheng
Copy link

Thanks for your great work. Can I request for evaluation for new models to add into the leaderboard?

@CodingWithTim
Copy link
Collaborator

Hi @chujiezheng, I am a fan of your works! We would love to add new models. Could you give us more information on the model you want to add? Currently we are just putting a very lightweight leaderboard on README doc.

@chujiezheng
Copy link
Author

chujiezheng commented Jun 2, 2024

@CodingWithTim Thanks for your kind words! I have some HF models that I want to add:

They are ranked based on my educated guess for their performance. These models are obtained via our recently proposed ExPO (model extrapolation) method. You can find more ExPO-enhanced models in this 🤗 HuggingFace collection and see their performance on the AlpacaEval 2.0 leaderboard.

Due to the API and GPU limits, currently I have only ran the evaluation for Starling-LM-7B-beta-ExPO, which obtains a score of 24.9 and a 95% CI of (-2.2, 1.8). I attach the evaluation output files here. I will appreciate it if you could add Starling-LM-7B-beta-ExPO to the leaderboard. I will also greatly appreciate it if you could help evaluate the above other models and add them to the leaderboard.

BTW, as many research work has built their evaluation on Arena-Hard, do you have plans to build a leaderboard website like AlpacaEval?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants