model-quantization

Here are 26 public repositories matching this topic...

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

compression language-model knowledge-distillation model-quantization pruning-algorithms llm llm-compression efficient-llm

Updated Jun 17, 2025
Python

Efficient-ML / BiBench

Star

[ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarization.

benchmark binarization model-compression binary-neural-networks binarized-neural-networks model-quantization icml-2023

Updated Mar 4, 2024
Python

htqin / QuantSR

Star

[NeurIPS 2023 Spotlight] This project is the official implementation of our accepted NeurIPS 2023 (spotlight) paper QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution.

super-resolution quantized-neural-networks model-quantization

Updated May 13, 2024
Python

nbasyl / OFQ

Star

The official implementation of the ICML 2023 paper OFQ-ViT

icml model-compression model-compression-papers model-quantization vision-transformer vision-transformers icml2023 quantization-awar

Updated Oct 3, 2023
Python

ModelTC / QVGen

Star

[ICLR 2026] This is the official PyTorch implementation of "QVGen: Pushing the Limit of Quantized Video Generative Models".

wan iclr qat video-generation diffusion-models videogen model-quantization quantization-aware-training generative-ai text-to-video-generation cogvideox wan21 iclr2026

Updated Feb 11, 2026
Python

seonglae / llama2gptq

Sponsor

Star

Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.

chatbot cuda transformers question-answering gpt quantization rye model-quantization chatai streamlit-chat chatgpt langchain llama2 llama-2

Updated Nov 25, 2023
Python

HaoranREN / TensorFlow_Model_Quantization

Star

A tutorial of model quantization using TensorFlow

machine-learning tensorflow tensorflow-lite tflite model-quantization inference-efficiency quantization-aware-training

Updated Aug 2, 2021
Python

frickyinn / BiDense

Star

PyTorch implementation of "BiDense: Binarization for Dense Prediction," A binary neural network for dense prediction tasks.

model-compression model-quantization

Updated Nov 21, 2024
Python

This project distills a ViT model into a compact CNN, reducing its size to 1.24MB with minimal accuracy loss. ONNXRuntime with CUDA boosts inference speed, while FastAPI and Docker simplify deployment.

python docker image-classification knowledge-distillation onnx fastapi onnxruntime model-quantization vision-transformer

Updated May 17, 2025
Python

felixyustian / PerceiveReason_Apple_M5

Star

On-device Perceive → Reason pipeline for Apple Silicon: Core ML + Vision for perception, a swappable LanguageModel (Apple Foundation Models or Claude) for reasoning. Python conversion/quantization toolkit plus a SwiftUI reference app.

swift ios claude core-ml coremltools vision-framework swiftui mobilenetv3 on-device-ml on-device-ai model-quantization neural-engine apple-silicon foundation-models antrophic wwdc2026

Updated Jun 10, 2026
Python

sebasmos / curious-qmoe

Star

🔬 Curiosity-Driven Quantized Mixture of Experts

pytorch audio-classification mixture-of-experts model-quantization efficient-ai

Updated Mar 24, 2026
Python

tk-yasuno / deepseek-v3-quantization-analysis

Star

Comprehensive performance analysis of DeepSeek V3 quantization levels (FP16, Q8_0, Q4_0) on 16GB GPU environments.

quantization model-evaluation fp16 gpu-performance latency-analysis model-quantization inference-acceleration model-optimization llm-inference llm-optimization deepseek-v3 throughput-analysis

Updated Sep 27, 2025
Python

IonDen / mlx-quant-fidelity

Sponsor

Star

Measure MLX quantization quality loss — KL divergence, perplexity, top-token agreement for KV cache and weights

python machine-learning metal diagnostics quantization mlx kl-divergence perplexity model-quantization kv-cache apple-silicon llm llm-inference llm-eval kv-cache-quantization mlx-lm quantization-quality

Updated Jun 23, 2026
Python

yifu-ding / MoE-Slimming

Star

Official ICML 2026 Spotlight implementation for structural MoE compression, including attribution-guided channel scoring, coverage-maximized pruning, compact checkpoint construction, and fine-tuning support.

moe pruning sparsification model-quantization llm structural-pruning llm-compression

Updated May 23, 2026
Python

Shineii86 / ZImagePro

Star

🚀 Next-gen FP8 diffusion pipeline with ComfyUI backend & smart caching. Professional image generation on free Colab — zero setup, modular src/ package, one-click notebook.

python flux turbo vae image-generation aria2 text-to-image diffusion colab-notebook huggingface model-quantization fp8 stable-diffusion comfyui qwen

Updated May 16, 2026
Python

SriyanRavuri / model-efficiency-benchmarking

Star

PyTorch benchmark harness comparing full / fine-tuned / quantised NLP models on accuracy, latency, memory, and energy per 1,000 predictions. Produces accuracy-vs-emissions trade-off curves for stakeholder consumption.

python benchmarking inference-optimization green-ai model-quantization ai-sustainability trade-off-analysis

Updated May 11, 2026
Python

FlosMume / LLAMA-qLoRA-Unsloth-Starter

Star

Fine-tuning Llama models with QLoRA using Unsloth for supervised instruction tasks

pytorch llama lora fine-tuning efficient-training model-quantization large-language-models llm low-rank-adaptation qlora bitsandbytes open-source-ai unsloth

Updated Oct 20, 2025
Python

dslisleedh / NCNet-flax

Star

Unofficial implementation of NCNet using flax and jax

flax super-resolution jax model-quantization

Updated Jan 11, 2023
Python

MeghaaVerse / onnx-int8-quantization-pipeline

Star

Automated INT8 quantization pipeline for ONNX models (segmentation, classification, and anomaly detection) using ONNX Runtime QDQ format. Supports efficient deployment on edge devices such as Raspberry Pi.

raspberry-pi computer-vision deep-learning raspberrypi vision onnx edge-ai onnxruntime model-quantization onnx-models model-optimization int8-quantization ai-inference qdq-quantization

Updated Mar 10, 2026
Python

Egzavyer / QuickCV

Star

Two-stage confidence-gated YOLOv8 detector for autonomous driving, optimized for CPU with OpenVINO INT8 (0.946 mAP@0.5, ~2× faster). Built for uOttawa's SEG4180 (Applied ML, Dr. Daniel Shapiro).

computer-vision pytorch object-detection autonomous-driving openvino edge-ai model-quantization yolov8

Updated Jun 24, 2026
Python

Improve this page

Add a description, image, and links to the model-quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the model-quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model-quantization

Here are 26 public repositories matching this topic...

horseee / Awesome-Efficient-LLM

Efficient-ML / BiBench

htqin / QuantSR

nbasyl / OFQ

ModelTC / QVGen

seonglae / llama2gptq

HaoranREN / TensorFlow_Model_Quantization

frickyinn / BiDense

first-coding / VIT

felixyustian / PerceiveReason_Apple_M5

sebasmos / curious-qmoe

tk-yasuno / deepseek-v3-quantization-analysis

IonDen / mlx-quant-fidelity

yifu-ding / MoE-Slimming

Shineii86 / ZImagePro

SriyanRavuri / model-efficiency-benchmarking

FlosMume / LLAMA-qLoRA-Unsloth-Starter

dslisleedh / NCNet-flax

MeghaaVerse / onnx-int8-quantization-pipeline

Egzavyer / QuickCV

Improve this page

Add this topic to your repo