OPT with quantizable MatMuls #85

natuan · 2023-07-27T20:25:18Z

Repeat this PR which was left out probably during a rebase: https://github.com/neuralmagic/transformers/pull/78/files

(previous commits) * Add recipe_name to default file names * Upgrade to transformers release V4.30.2 (#62) * Update trainer and model flows to accommodate sparseml Disable FP16 on QAT start (#12) * Override LRScheduler when using LRModifiers * Disable FP16 on QAT start * keep wrapped scaler object for training after disabling Using QATMatMul in DistilBERT model class (#41) Removed double quantization of output of context layer. (#45) Fix DataParallel validation forward signatures (#47) * Fix: DataParallel validation forward signatures * Update: generalize forward_fn selection Best model after epoch (#46) fix sclaer check for non fp16 mode in trainer (#38) Mobilebert QAT (#55) * Remove duplicate quantization of vocabulary. enable a QATWrapper for non-parameterized matmuls in BERT self attention (#9) * Utils and auxillary changes update Zoo stub loading for SparseZoo 1.1 refactor (#54) add flag to signal NM integration is active (#32) Add recipe_name to file names * Fix errors introduced in manual cherry-pick upgrade Co-authored-by: Benjamin Fineran <[email protected]> * update build versions for NM fork pypi push (#74) * fix nightly package name (#75) * add make build command (#76) * add GHA workflow files to build nightly and release packages (#77) * add GHA workflow files to build nightly and release packages * fix name --------- Co-authored-by: dhuang <[email protected]> * bump up version to 1.6.0 (#79) Co-authored-by: dhuang <[email protected]> --------- Co-authored-by: Konstantin <[email protected]> Co-authored-by: Konstantin Gulin <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuang <[email protected]> minor improvements for build workflow files (#83) Co-authored-by: dhuang <[email protected]> fix minor issue (#84) Co-authored-by: dhuang <[email protected]> OPT with quantizable MatMuls (#85) fix a minor issue for release build (#86) Co-authored-by: dhuang <[email protected]> update version in version.py Testmo (#91) * improve GHA workflow files to build nightly and release, and report status to testmo * clean up * report exit code * Assign value to exit_code --------- Co-authored-by: dhuang <[email protected]> Update trainer.py - fix DistributedSampler import (#93) DistributedSampler is used but not imported in `trainer.py` Research/llama/bmm quantization (#94) * Quantize attention matmuls * Quantize attention matmuls bump base transformers version

OPT with quantizable MatMuls

0864e7f

natuan requested review from bfineran, anmarques, rahul-tuli and dbogunowicz July 27, 2023 20:25

anmarques approved these changes Jul 27, 2023

View reviewed changes

natuan requested a review from shubhra July 27, 2023 21:06

shubhra approved these changes Jul 27, 2023

View reviewed changes

natuan merged commit 38ae788 into main Jul 27, 2023
1 of 2 checks passed

dsikka pushed a commit that referenced this pull request Aug 17, 2023

OPT with quantizable MatMuls (#85)

b31d679

dsikka pushed a commit that referenced this pull request Aug 17, 2023

OPT with quantizable MatMuls (#85)

0a90ee3

bfineran pushed a commit that referenced this pull request Oct 26, 2023

OPT with quantizable MatMuls (#85)

b8ab0a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPT with quantizable MatMuls #85

OPT with quantizable MatMuls #85

natuan commented Jul 27, 2023

OPT with quantizable MatMuls #85

OPT with quantizable MatMuls #85

Conversation

natuan commented Jul 27, 2023