Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
-
Updated
Jun 18, 2026 - Python
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Incremental engine for long horizon agents 🌟 Star if you like it!
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
A lightweight data processing framework built on DuckDB and 3FS.
A light-weight, flexible, and expressive statistical data testing library
High-performance AI pipeline engine with a C++ core and 50+ Python-extensible nodes. Build, debug, and scale LLM workflows with 13+ model providers, 8+ vector databases, and agent orchestration, all from your IDE. Includes VS Code extension, TypeScript/Python SDKs, and Docker deployment.
The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure
Large-scale pretraining for dialogue
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Python Stream Processing
Scalable data pre processing and curation toolkit for LLMs
Extract Transform Load for Python 3.5+
Concurrent Python made simple
Data and tools for generating and inspecting OLMo pre-training data.
Distribute and run AI workloads on Kubernetes magically in Python, like PyTorch for ML infra.
Large-scale pretrained models for goal-directed dialog
All-in-one text de-duplication
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
📈 PatternPy: A Python package revolutionizing trading analysis with high-speed pattern recognition, leveraging Pandas & Numpy. Effortlessly spot Head & Shoulders, Tops & Bottoms, Supports & Resistances. For experts & beginners. #TradingMadeEasy 🔥
Add a description, image, and links to the data-processing topic page so that developers can more easily learn about it.
To associate your repository with the data-processing topic, visit your repo's landing page and select "manage topics."