model-reliability

Here are 8 public repositories matching this topic...

Khanz9664 / TrustLens

Open-source Python library for evaluating ML model reliability beyond accuracy — with calibration, failure, and fairness diagnostics for informed deployment decisions.

python data-science machine-learning opensource calibration fairness python-package model-evaluation ai-safety evaluation-framework explainable-ai bias-detection mlops fairness-ml model-monitoring trustworthy-ai model-reliability

Updated Jul 3, 2026
Python

DURGESH716 / Creating-Hard-Reasoning-Benchmark

Star

Hard Reasoning Benchmark filtered with disagreement scores

benchmarks mathematical-modelling model-evaluation gsm8k arc-agi model-reliability

Updated Feb 14, 2026
Python

PromptGuard is a pragmatic, opinionated framework for establishing continuous integration for LLM behavior. It operates on a simple, verifiable principle: run the same prompts across multiple model configurations, compare outputs against defined expectations, and flag semantic regressions.

python nlp open-source developer-tools regression-testing ai-safety mlops llm prompt-engineering prompt-testing llm-evaluation ai-infrastructure model-reliability semantic-drift llm-systems

Updated Jan 17, 2026
Python

aaa-mvc / capability-schema-spec

Star

Capability Schema Spec defines a shared semantic language for world model evaluation. Standardize capability definition, observation, and verification across models and benchmarks. Not a benchmark—a shared language. Define • Observe • Verify

Updated Jul 3, 2026
Python

aaa-mvc / capability-schema-reference

Star

Reference implementation of the Capability Schema Specification. Proves that world model capabilities can be defined, observed, and verified in practice — with real checkpoints, real simulators, and real scores. Define • Observe • Verify • Deliver

Updated Jul 2, 2026
Python

GoparapukethaN / ai-reliability-lab

Star

Enterprise-style RAG reliability platform for MLOps docs: cited answers, evals, traces, FastAPI, Next.js.

python docker retrieval sqlite nextjs evaluation rag mlops fastapi ai-engineering llmops model-reliability

Updated May 20, 2026
Python

ReviveCoding / visual-attribute-reliability

Star

A reproducible visual-attribute verification framework combining group-disjoint evaluation, audited LoRA controls, calibration analysis, and CI-backed evidence contracts.

computer-vision model-calibration fashionpedia vision-language-model parameter-efficient-fine-tuning model-reliability siglip2 reproducible-ml visual-attribute-recognition

Updated Jul 2, 2026
Python

codedbyelif / els-judge

Star

Multi-LLM consensus engine for automated code review, diff analysis, and risk scoring.

python docker tui multi-agent multi-agent-systems ai-safety diff-analysis ai-evaluation llms model-reliability llm-benchmark ai-safety-research

Updated Mar 17, 2026
Python

Improve this page

Add a description, image, and links to the model-reliability topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the model-reliability topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model-reliability

Here are 8 public repositories matching this topic...

Khanz9664 / TrustLens

DURGESH716 / Creating-Hard-Reasoning-Benchmark

Tarunjit45 / PromptGuard

aaa-mvc / capability-schema-spec

aaa-mvc / capability-schema-reference

GoparapukethaN / ai-reliability-lab

ReviveCoding / visual-attribute-reliability

codedbyelif / els-judge

Improve this page

Add this topic to your repo