[ICCV 2025] DONUT: A Decoder-Only Model for Trajectory Prediction
-
Updated
Mar 23, 2026 - Python
[ICCV 2025] DONUT: A Decoder-Only Model for Trajectory Prediction
Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and vision-language capabilities
使用Decoder-only的Transformer进行时序预测,包含SwiGLU和RoPE(Rotary Positional Embedding),Time series prediction using Decoder-only Transformer, Including SwiGLU and RoPE(Rotary Positional Embedding)
Code for paper "Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI"
[ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
SAMPO: Scale-wise Autoregression with Motion Prompt for Generative World Models
Minimal decoder-only seq2seq pipeline with proper causal masking, teacher forcing, Ignite training loop, and checkpointed inference
ViAG: A Novel Framework for Fine-tuning Answer Generation models ultilizing Encoder-Decoder and Decoder-only Transformers's architecture
Clean-room GPT-2/GPT-3 implementation: tokenizers, architecture blocks, training loop with AdamW + cosine decay, CLI scripts, inference tools, and pytest suite. Covers OpenWebText-10k & WikiText-103 workflows. Designed as an academic reference for understanding and scaling decoder-only transformers
A from-scratch implementation of a scaled-down GPT-2 model in PyTorch, trained on the Snappfood dataset for sentiment-controlled Persian text generation.
Open source implementation of AMALIA: A Fully Open Large Language Model for European Portuguese
Train a decoder-only GPT language model from scratch for code and math reasoning — custom 16k BPE tokenizer, streaming data pipeline, ablation studies, and hardware-aware training on Apple Silicon (MPS) and CUDA.
Implementation of the GPT-2 architecture using PyTorch, trained on the TinyStories dataset. Features custom training pipelines on Modal (cloud computing) and integration with the Hugging Face ecosystem.
This project is my PyTorch reproduction of PaliGemma, a compact 3B vision–language model that integrates SigLIP vision features with a Gemma decoder. I implemented the full multimodal pipeline from vision encoding to autoregressive text generation to study modern VLM architectures from a research perspective.
Implement a decoder-only Transformer in PyTorch to reverse character sequences using causal masking and cross-entropy loss with Ignite training support
Criando um modelo Transformer do zero com variações como Multi-Head Attention e Grouped Query Attention em livros de Machado de Assis.
中文至英文序列转导模型。从零严格复现《Attention Is All You Need》在中文→英文机器翻译(Zh→En)上的完整流程。
Auto regressive text generation application using decoder transformer
Add a description, image, and links to the decoder-only topic page so that developers can more easily learn about it.
To associate your repository with the decoder-only topic, visit your repo's landing page and select "manage topics."