Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "kernel: use tensor cores for flashinfer gqa kernels" #1511

Merged
merged 1 commit into from
Sep 25, 2024

Conversation

Ying1123
Copy link
Member

@Ying1123 Ying1123 commented Sep 25, 2024

Reverts #1403

This commit creates significant accuracy degradation, detected by llama3.1-70b-instruct on humaneval.

# reproduce
python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-70B-Instruct --tp 4 --enable-p2p-check
python3 run_eval.py --host 127.0.0.1 --port 30000 --model meta-llama/Meta-Llama-3.1-70B-Instruct --eval-name humaneval

@Ying1123 Ying1123 enabled auto-merge (squash) September 25, 2024 05:50
@Ying1123 Ying1123 merged commit f39a019 into main Sep 25, 2024
1 of 10 checks passed
@Ying1123 Ying1123 deleted the revert-1403-main branch September 25, 2024 05:50
@zhyncs
Copy link
Member

zhyncs commented Sep 25, 2024

cc @yzh119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants