NVIDIA / TensorRT-LLM Public

Notifications You must be signed in to change notification settings
Fork 931
Star 8.3k

Code
Issues 683
Pull requests 71
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: NVIDIA/TensorRT-LLM

Labels 30 Milestones 0

New pull request New

71 Open 294 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Fix errors when quantizing Llama model

#2264 opened Sep 28, 2024 by dleunji

Loading…

fix: none prompt to string

#2259 opened Sep 26, 2024 by dongs0104

Loading…

README.md: Add 3rd Party Inference Speed Dashboard documentation

Improvements or additions to documentation

#2244 opened Sep 22, 2024 by matichon-vultureprime

Loading…

fix: add support for passing calib sequence length, and num samples + fixing use of custom calibration dataset for smoothquant in llama

#2243 opened Sep 19, 2024 by Bhuvanesh09

Loading…

Modify small-batched weight only quantization quantization

Issue about lower bit quantization, including int8, int4, fp8

triaged

Issue has been triaged by maintainers

#2213 opened Sep 10, 2024 by dasistwo

Loading…

Fix extra-index-url for torch installation Merged Windows

#2188 opened Sep 3, 2024 by pamelap-nvidia

Loading…

[examples/bert/build.py]: Load weights for BertModel and RobertaModel if --model_dir is provided triaged

Issue has been triaged by maintainers

#2187 opened Sep 3, 2024 by tkhanipov

Loading…

Create sync.yml

#2154 opened Aug 27, 2024 by inkimikoko

Loading…

Add workaround instruction for a known issue of v0.11 on Windows Merged

#2146 opened Aug 23, 2024 by pamelap-nvidia

Loading…

fix wrong buffer for oneShotAllReduceKernel under PUSH_MODE

#2099 opened Aug 8, 2024 by YconquestY

Loading…

Fix the workspace size calculation for quantization plugins Merged

#2097 opened Aug 7, 2024 by ZhangGe6

Loading…

decoder MMHA kernel support INT8 SCALE_Q_INSTEAD_OF_K and SCALE_P_INS…

#2085 opened Aug 5, 2024 by lishicheng1996

Loading…

Include use_fused_mlp when constructing BuildConfig from dict

#2081 opened Aug 2, 2024 by ethnzhng

Loading…

typo fix quick-start-guide.md

#2075 opened Aug 1, 2024 by sweetning0809

Loading…

fix GemmFpAIntB MMa::IteratorB::Layout

#2070 opened Jul 31, 2024 by luliyucoordinate

Loading…

fix wrong arg in Engine Building Command in docs/source/performance/perf-overview.md documentation

Improvements or additions to documentation

Merged

#2057 opened Jul 30, 2024 by RuibaiXu

Loading…

Correct the version

#1936 opened Jul 12, 2024 by Shixiaowei02

Loading…

Fix default min length triaged

Issue has been triaged by maintainers

#1935 opened Jul 11, 2024 by akhoroshev

Loading…

Add support for custom tokenizer and batch size

#1927 opened Jul 9, 2024 by uppalutkarsh

Loading…

Add support for falcon2 triaged

Issue has been triaged by maintainers

#1926 opened Jul 9, 2024 by puneeshkhanna

Loading…

Dev sm87 trt101

#1880 opened Jul 3, 2024 by sunnyqgg

Loading…

Bump transformers from 4.36.2 to 4.38.0 in /examples/multimodal bug

Something isn't working

dependencies

Pull requests that update a dependency file

triaged

Issue has been triaged by maintainers

waiting for feedback

#1689 opened May 28, 2024 by dependabot bot

Loading…

add cached generation buffer triaged

Issue has been triaged by maintainers

waiting for feedback

#1685 opened May 28, 2024 by michael200892458

Loading…

Fix CUDA OOM when creating Mixtral checkpoint triaged

Issue has been triaged by maintainers

waiting for feedback

#1629 opened May 19, 2024 by VivekBits2210

Loading…

Add support for non-power-of-two heads with Alibi triaged

Issue has been triaged by maintainers

#1611 opened May 15, 2024 by vmarkovtsev

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly