A collection of memory efficient attention operators implemented in the Triton language.
-
Updated
Jun 24, 2026 - Python
A collection of memory efficient attention operators implemented in the Triton language.
Triton implementation of FlashAttention2 that adds Custom Masks.
Triton implement of bi-directional (non-causal) linear attention
VIT inference in triton because, why not?
A "standard library" of Triton kernels.
[ICML'26] Beyond Test-Time Memory: State-Space Optimal Control for LLM Reasoning
Educational resource demonstrating common GPU programming pitfalls and solutions using Triton kernels.
Experimental Rust DSL for writing GPU kernels that compile through the Triton compiler — no Python required.
LAMB go brrr
🧠️🖥️2️⃣️0️⃣️0️⃣️1️⃣️💾️📜️ The sourceCode:Triton category for AI2001, containing Triton programming language datasets
FlashAttention implementations using CUDA and Triton
collection of high-performance CUDA implementations, ranging from naive to highly optimized versions.
A container of various PyTorch neural network modules written in Triton.
🌳️🌐️#️⃣️ The Bliss Browser Triton (ClosedAI) language support module, allowing Triton (ClosedAI) programs to be written in and ran within the browser.
Writing TensorRT plugins using Triton and Python
Fast Golu Activation in Triton
LooLoLo is a command-line analyzer for tracking source-location metadata across MLIR transformation stages
Windows NVIDIA-only Triton 3.7.0 build pipeline for RTX 5090 / Blackwell sm_120a, with FP8 tl.dot validation and peak benchmark results.
Add a description, image, and links to the triton-lang topic page so that developers can more easily learn about it.
To associate your repository with the triton-lang topic, visit your repo's landing page and select "manage topics."