Neural Magic
Neural Magic helps developers in accelerating machine learning performance using automated model sparsification techniques and inference technologies.
Pinned Loading
Repositories
Showing 10 of 55 repositories
- transformers Public Forked from huggingface/transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
neuralmagic/transformers’s past year of commit activity - nm-vllm-certs Public
General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
neuralmagic/nm-vllm-certs’s past year of commit activity - quant_kernel_benchmarks Public
Benchmarking code for running quantized kernels from vLLM and other libraries
neuralmagic/quant_kernel_benchmarks’s past year of commit activity - compressed-tensors Public
A safetensors extension to efficiently store sparse quantized tensors on disk
neuralmagic/compressed-tensors’s past year of commit activity - OmniQuant Public Forked from OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
neuralmagic/OmniQuant’s past year of commit activity - flash-attention Public Forked from vllm-project/flash-attention
Fast and memory-efficient exact attention
neuralmagic/flash-attention’s past year of commit activity - lm-evaluation-harness Public Forked from EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
neuralmagic/lm-evaluation-harness’s past year of commit activity