Skip to content

Releases: open-compass/opencompass

0.3.2.post1

06 Sep 10:48
b5f8afb
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.3.2...0.3.2.post1

0.3.2

06 Sep 08:21
ff18545
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.2!

🚀 New Features

  • 🛠 Added extra_body support for OpenAISDK and introduced proxy URL support when connecting to OpenAI's API.
  • 🗂 Included auto-download functionality for Mmlu-pro, Needlebench, Longbench and other datasets.
  • 🤝 Integrated support for the Rendu API.
  • 🧪 Added a model postprocess function.

📖 Documentation

  • 📜 Updated the README file for better clarity and guidance.

🐛 Bug Fixes

  • 🛠 Fixed CLI evaluation for multiple models.
  • 🛠 Updated requirements to resolve dependency issues.
  • 🛠 Corrected configurations for the Llama model series.
  • 🛠 Addressed bad cases and added environment information to improve testing.

⚙ Enhancements and Refactors

  • 🛠 Made OPENAI_API_BASE compatible with OpenAI's default environment settings.
  • 🛠 Optimized SciCode for improved performance.
  • 🛠 Added an api_key attribute to TurboMindAPIModel.
  • 🛠 Implemented fixes and improvements to the CI test environment, including baselines for vllm.

🎉 Welcome New Contributors

  • 👋 @cpa2001 contributed with the addition of icl_sliding_k_retriever.py and updates to __init__.py.
  • 👋 @gyin94 made the OPENAI_API_BASE compatible with OpenAI's default environment.
  • 👋 @chengyingshe added an attribute api_key into TurboMindAPIModel.
  • 👋 @yanzeyu supported the integration of Rendu API.

Full Changelog: 0.3.1...0.3.2

OpenCompass v0.3.1

23 Aug 03:00
5485207
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.1!


🌟 Highlights

  • 🚀 Support pip installation, update Readme and evaluation demo
  • 🐛 Fixed various dataset loading issues.
  • ⚙️ Enhanced auto-download features for datasets.

🚀 New Features

  • 🆕 Introduced support for Ruler datasets.
  • 🆕 Enhanced model compatibility.
  • 🆕 Improved dataset handling, support auto-download for various datasets

📖 Documentation

  • 📚 Updated README to reflect the latest changes.
  • 📚 Improved documentation for dataset loading procedures.

🐛 Bug Fixes

  • 🐞 Resolved modelscope dataset load issues.
  • 🐞 Corrected evaluation scores for the Lawbench dataset.
  • 🐞 Fixed dataset bugs for CommonsenseQA and Longbench.

⚙ Enhancements and Refactors

  • 🔧 Retained first and last halves of prompts to avoid max_seq_len issues.
  • 🔧 Updated Compassbench to v1.3.
  • 🔧 Switched to Python runner for single GPU operations.

🎉 Welcome New Contributors

  • 🙌 @Yunnglin for fixing modelscope dataset load problem.
  • 🙌 @changyeyu for addressing max_seq_len issues with prompt handling.
  • 🙌 @seetimee for updates to openai_api.py.
  • 🙌 @HariSeldon0 for adding the scicode dataset.

What's Changed

Full Changelog: 0.3.0...0.3.1


Thank you for your continued support and contributions to OpenCompass!

OpenCompass v0.3.0

06 Aug 17:34
264fd23
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.0! This release brings a variety of new features, enhancements, and bug fixes to improve your experience.

🌟 Highlights

  1. Support for OpenAI ChatCompletion
  2. Updated Model Support List
  3. Support Dataset Automatic Download
  4. Support pip install opencompass

🚀 New Features

  1. Support for CompassBench Checklist Evaluation
  2. Adding support for Doubao API
  3. Support for ModelScope Datasets

📖 Documentation

  1. Update NeedleBench Docs
  2. Update Documentation

🐛 Bug Fixes

  1. Fix Typing and Typo
  2. Fix Lint Issues
  3. Fix Summary Error in subjective.py

⚙ Enhancements and Refactors

  1. Upgrade Default Math pred_postprocessor
  2. Fix Path and Folder Updates
  3. Update Get Data Path for LCBench and HumanEval

🔗 Full Change Logs

🎉 Welcome New Contributors

Full Changelog: 0.2.6...0.3.0

OpenCompass v0.2.6

05 Jul 16:36
a62c613
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.6!

🌟 Highlights

  • No noteworthy highlights.

🚀 New Features

  1. #1215 #1224 #1266 Add Datasets MT-Bench-101, Fofo, wildbench
  2. #1286 Add Models InternLM2.5-7B

📖 Documentation

  1. #1252 Add doc for accelerator function
  2. #1263 Update quick start guide

🐛 Bug Fixes

  1. #1221 Resolve release version installation and import issues
  2. #1228 Fix pip version issues
  3. #1282 Update MathBench summarizer & fix cot setting

⚙ Enhancements and Refactors

  1. #1284 Reorganize subjective eval

🎉 Welcome New Contributors

🔗 Full Change Logs

Full Changelog: 0.2.5...0.2.6

OpenCompass v0.2.5

29 May 16:35
a77b8a5
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.5!

🌟 Highlights

  • Simplify the huggingface / vllm / lmdeploy model wrapper. meta_template is no longer needed to be hand-crafted in model configs
  • Introduce evaluation results README in ~20 dataset config folders.

🚀 New Features

  1. #1065 Add LLaMA-3 Series Configs
  2. #1048 Add TheoremQA with 5-shot
  3. #1094 Support Math evaluation via judgemodel
  4. #1080 Add gpqa prompt from simple_evals, openai
  5. #1074 Add mmlu prompt from simple_evals, openai
  6. #1123 Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs

📖 Documentation

  1. #1053 Update readme
  2. #1102 Update NeedleInAHaystack Docs
  3. #1110 Update README.md
  4. #1205 Remove --no-batch-padding and Use --hf-num-gpus

🐛 Bug Fixes

  1. #1036 Update setup.py install_requires
  2. #1051 Fixed the issue caused
  3. #1043 fix multiround
  4. #1070 Fix sequential runner
  5. #1079 Fix Llama-3 meta template

⚙ Enhancements and Refactors

  1. #1163 enable HuggingFacewithChatTemplate with --accelerator via cli
  2. #1104 fix prompt template
  3. #1109 Update performance of common benchmarks

🎉 Welcome New Contributors

🔗 Full Change Logs

Read more

OpenCompass v0.2.5.rc1

23 Apr 09:21
81d0e4d
Compare
Choose a tag to compare
Pre-release
[Feature] Add lmdeploy tis python backend model (#1014)

* add lmdeploy tis python backend model

* fix pr check

* update

OpenCompass v0.2.4

09 Apr 10:06
b39f501
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.4!

🌟 Highlights

  • Enhanced support for multiple datasets including QuALITY, APPS and TACO.
  • Introducing multi-model judging for subjective test.
  • Bug fixes and improvements in configurations and documentation.

🚀 New Features

🌐 General

  1. Feat #963 - Support for APPS dataset.
  2. Feature #976 - Add the implementation of QuALITY datasets.
  3. Feature #984 - Add support for setting prediction paths.
  4. Feature #1006 - Support alpacaeval_v2.
  5. Feature #1016 - Add multi-model judge.
  6. Feature #1019 - Add ATC Choice Version.

📖 Documentation

  1. Updates docs #1015 - General documentation updates and improvements.

🐛 Bug Fixes

  1. Fix #964 - Fix the config's name of deepseek-coder.
  2. Fix #890 - Update links and link checkers.
  3. Fix #977 - Fix a bug in internlm2 series configs.
  4. Fix #975 - Fix documentation issues.
  5. Fix #992 - Fix running issues in turbomind_tis.
  6. Fix #994 - Change status to list in base.py.
  7. Fix #995, Fix #1020 - Quick fixes and refactors for configs.

⚙ Enhancements and Refactors

  1. Modify requirements/runtime.txt #983 - Update numpy version requirement.
  2. Update Needlebench and configs #986 - Enhancements in Needlebench configurations.
  3. Simplify needlebench summarizer #1024 - Streamline Needlebench summarizer for better efficiency.

🎉 Welcome New Contributors

🔗 Full Change Logs

[Fix] fix the config's name of deepseek-coder by @jingmingzhuo in #964
[Fix] Update links and link checkers by @Leymore in #890
[Feat] support apps by @Connor-Shen in #963
fix doc problem by @seanzhang-zhichen in #975
[Fix] fix a bug in internlm2 series configs by @jingmingzhuo in #977
[Feature] Add the implement of QuALITY datasets by @jingmingzhuo in #976
modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 by @kleinzcy in #983
[Feature] add support for set prediction path by @bittersweet1999 in #984
[Feat] Support TACO by @Connor-Shen in #966
[Feature] update apps by @Connor-Shen in #985
[Fix] update apps/taco by @Connor-Shen in #988
[Feature] add one script for subjective by @bittersweet1999 in #993
Fix running issues in turbomind_tis by @ispobock in #992
[Fix] base.py change status into list by @Chaseldot in #994
[Fix] quick fix for configs by @bittersweet1999 in #995
[Feature] update needlebench and configs by @DseidLi in #986
[Feature] support alpacaeval_v2 by @bittersweet1999 in #1006
updates docs by @Y0oMu in #1015
[Feature] Add multi-model judge and fix some problems by @bittersweet1999 in #1016
[Fix] Refactor Needlebench Configs for CLI Testing Support by @DseidLi in #1020
[Feature] Add ATC Choice Version by @DseidLi in #1019
[Fix] Simplify needlebench summarizer by @DseidLi in #1024

For a detailed overview of all changes, check out our Full Changelog.

OpenCompass v0.2.4.rc1

25 Mar 10:15
0a6a03f
Compare
Choose a tag to compare
Pre-release

Provide with more parsed datasets:

OpenCompassData-complete-20240325.zip

Important updates compared to previous version are as follow:

Subjective: Add MTBench
LongText: Support Needle-In-Haystack Test Dataset
Code: Update generation version of CIBench

OpenCompass v0.2.3

12 Mar 03:53
ab6cdb2
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.3! This version is packed with new features, crucial fixes, and documentation updates to improve your experience. We're continuously working to enhance OpenCompass, making it more robust and versatile for all users.

🌟 Highlights:

  • Enhanced Model Support: Introduction of new models and configurations, including support for the LightllmApi, lmdeploy pytorch engine, and more.
  • New Datasets and Benchmarks: Expanding our dataset repository with additions like OpenFinData, lveval benchmark, and an upgrade to Needlebench.
  • Documentation and Sync Improvements: Updated dataset pack URLs, fixed documentation errors, and synchronized with internal codes for consistency.

Explore the key updates in this release:

🌟 New Features:

  • 📦 Dataset and Benchmark Expansion:

    • Support for new datasets like OpenFinData and an upgrade to Needlebench, offering broader evaluation capabilities (#896, #913).
    • Introduction of the lveval benchmark to enrich the evaluation landscape (#914).
  • 🛠 Model and API Integrations:

    • Enhanced functionality with support for LightllmApi input_format and prompt templates, alongside the introduction of get_ppl for TurbomindModel (#888, #878).
    • New model configurations added, including support for gemini and deepseek-coder, further broadening the tools available for users (#931, #943).
  • 📖 Documentation and Sync Updates:

    • Updated dataset pack URLs and rank link in README to ensure users have access to the latest resources (#922, #911).
    • Several syncs with internal codes and GitHub blacklist update to maintain consistency and integrity (#929, #953).

🐛 Bug Fixes:

  • Addressed various configuration and template issues to ensure smoother operation across different models and benchmarks (#894, #893).
  • Fixed issues related to IFEval, including type hints and config bugs, enhancing evaluation accuracy and functionality (#906, #915).

🎉 Welcome New Contributors:

🔗 Full Changelog

For a detailed overview of all changes, check out our Full Changelog.