AdaptRL: Adaptive VR Training System via Reinforcement Learning

A real-time, biometric-driven RL dashboard for personalized VR training difficulty adjustment.

Overview

AdaptRL is an adaptive reinforcement learning system that dynamically adjusts VR training difficulty in real-time based on:

Performance metrics: accuracy, reaction time, error streak
Biometric signals: heart rate, EEG stress, gaze drift, fatigue
RL policy: PPO (Proximal Policy Optimization) trained to maximize learning efficiency while preventing cognitive overload

Problem Statement

Traditional VR training uses fixed difficulty levels, which causes:

Underutilization: Bored trainees, no learning progression
Cognitive overload: Stressed trainees, poor retention
No personalization: One-size-fits-all approach fails for diverse learners

AdaptRL solves this by using RL to find the optimal difficulty trajectory for each trainee in real-time.

Key Features

✅ Real-time Biometric Integration — Heart rate, EEG stress, gaze tracking
✅ RL-Driven Adaptation — PPO policy adjusts difficulty every 1.5s
✅ Live Dashboard — WebSocket-powered reactive UI with Framer Motion animations
✅ Algorithm Benchmarking — Compare PPO vs A2C vs SAC
✅ MDP Explorer — Interactive reward function tuning
✅ Stress Event Logging — Track overload events and recovery

Quick Start

Prerequisites

Python 3.10+ (backend)
Node.js 18+ (frontend)
pip and npm

Backend Setup

\\�ash cd backend pip install -r requirements.txt export CORS_ORIGINS=http://localhost:3000 python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 \\

Backend runs on http://localhost:8000
WebSocket endpoint: ws://localhost:8000/ws/live

Frontend Setup

\\�ash cd frontend npm install echo 'NEXT_PUBLIC_API_BASE_URL=http://localhost:8000' > .env.local echo 'NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws/live' >> .env.local npm run dev \\

Frontend runs on http://localhost:3000

Verify Installation

Open http://localhost:3000 in browser
You should see the dashboard with live KPI cards updating
Check browser console for WebSocket connection logs
Navigate to "RL Agent" page to see training logs

Architecture

System Overview

\
┌─────────────────────────────────────────────────────────────┐ │ VR Training Environment │ │ (Simulated: accuracy, HR, EEG, fatigue, reaction time) │ └────────────────────┬────────────────────────────────────────┘ │ ▼ ┌────────────────────────┐ │ Simulation Engine │ │ (sim_engine.py) │ │ - tick() every 1.5s │ │ - next_action() │ │ - reward_fn() │ └────────────┬───────────┘ │ ▼ ┌────────────────────────┐ │ State Store │ │ (state_store.py) │ │ - In-memory singleton │ │ - Biometric buffers │ │ - Overload counter │ └────────────┬───────────┘ │ ▼ ┌────────────────────────┐ │ WebSocket Broadcast │ │ (manager.py) │ │ - JSON payload │ │ - 1.5s cadence │ └────────────┬───────────┘ │ ▼ ┌────────────────────────┐ │ Frontend Dashboard │ │ (React + Next.js) │ │ - Live KPI cards │ │ - Animated charts │ │ - Status indicators │ └────────────────────────┘ \\

Data Flow

Simulation Tick (every 1.5s)
- sim_engine.tick() computes new state
- Biometric signals updated via sine waves + noise

ext_action() selects action based on state

eward_fn() computes reward signal

Overload events tracked

State Snapshot
- store.snapshot() serializes current state
- Includes: episode, reward, difficulty, biometrics, action, timestamp
WebSocket Broadcast
- Payload sent to all connected clients
- Clients update live state via setLive()
- Charts and KPIs re-render with animations
Frontend Rendering
- Framer Motion animates value changes
- Charts interpolate smoothly
- Status indicators pulse/breathe based on connection state

Reinforcement Learning Theory

What is Reinforcement Learning?

RL is a machine learning paradigm where an agent learns to make decisions by interacting with an environment:

\
Agent observes State (s_t) ↓ Agent selects Action (a_t) based on policy π ↓ Environment transitions to new State (s_t+1) ↓ Environment provides Reward (r_t) ↓ Agent learns to maximize cumulative reward \\

Key RL Concepts

State (s): Complete description of the environment at time t

In AdaptRL: accuracy, reaction_time, heart_rate, eeg_stress, error_streak, gaze_drift

Action (a): Decision the agent makes

In AdaptRL: enemy_count_delta, complexity, hints

Reward (r): Scalar feedback signal

In AdaptRL: weighted combination of performance gain and stress penalty

Policy (π): Mapping from states to actions

In AdaptRL: PPO neural network trained to maximize cumulative reward

Value Function (V): Expected cumulative future reward from state s

Used by PPO to estimate advantage of each action

Advantage (A): How much better an action is vs. the average

A(s,a) = Q(s,a) - V(s)
PPO uses this to update policy

Why PPO?

PPO (Proximal Policy Optimization) is chosen because:

Metric	PPO	A2C	SAC
Discrete Actions	✅ Best	OK	Continuous-leaning
Stability	✅ High	Medium	High
Sample Efficiency	Medium	Low	✅ High
Convergence Speed	✅ Fast	Slow	Medium
Implementation	✅ Simple	Complex	Complex
Unity ML-Agents	✅ Native	Good	OK

PPO Key Features:

Clipped objective: Prevents policy from changing too much per update
Entropy bonus: Encourages exploration
Generalized Advantage Estimation (GAE): Reduces variance in advantage estimates
Multiple epochs: Reuses data efficiently

MDP Formulation

Markov Decision Process

AdaptRL is formulated as a finite-horizon MDP:

\
MDP = (S, A, P, R, γ, T)

S = State space (continuous) A = Action space (discrete) P = Transition dynamics (deterministic in simulation) R = Reward function γ = Discount factor (0.99) T = Episode horizon (500k steps) \\

State Space

Dimension: 6-dimensional continuous space

\\python State = { accuracy: float ∈ [0.60, 0.98], # Task performance reaction_time: float ∈ [0.20, 0.60], # Response speed (seconds) heart_rate: float ∈ [68, 110], # BPM eeg_stress: float ∈ [0.15, 0.95], # Normalized stress index error_streak: int ∈ [0, 3], # Consecutive errors gaze_drift: float ∈ [0.05, 0.50], # Attention deviation } \\

Normalization: All continuous values normalized to [0, 1] for neural network input.

Action Space

Dimension: 3 discrete factors (27 total actions)

\\python Action = { enemy_count_delta: {"-5", "0", "+5"}, # Difficulty adjustment complexity: {"LOW", "MED", "HIGH"}, # Task complexity hints: {"NONE", "AUDIO", "VISUAL"}, # Assistance type } \\

Semantics:

enemy_count_delta: Adjust number of enemies (difficulty)
complexity: Change task complexity level
hints: Provide audio/visual assistance to reduce stress

Reward Function

\\python R(s, a) = α·ΔAccuracy - β·StressPenalty + γ·EngagementBonus - δ·OverloadFlag

where: α = 0.5 (accuracy weight) β = 0.3 (stress weight) γ = 0.2 (engagement weight) δ = 0.4 (overload weight)

ΔAccuracy = max(0, accuracy - 0.75) StressPenalty = eeg_stress EngagementBonus = 0.25 if difficulty ≥ 5 else 0.1 OverloadFlag = 1 if (heart_rate > 95 OR eeg_stress > 0.75) else 0 \\

Interpretation:

Maximize accuracy gain (α term): Encourage learning
Minimize stress (β term): Prevent cognitive overload
Reward engagement (γ term): Maintain challenge
Penalize overload (δ term): Detect and prevent burnout

Policy Gradient Objective (PPO)

\
L^CLIP(θ) = Ê_t [ min(r_t(θ)·Â_t, clip(r_t(θ), 1-ε, 1+ε)·Â_t) ]

where: r_t(θ) = π_θ(a_t|s_t) / π_θ_old(a_t|s_t) (probability ratio) Â_t = generalized advantage estimate ε = 0.2 (clipping range) \\

Intuition: PPO clips the probability ratio to prevent large policy updates that could destabilize training.

System Design

Backend Architecture

\
backend/app/ ├── main.py # FastAPI app, lifespan, WebSocket ├── core/ │ └── config.py # Pydantic settings, CORS config ├── services/ │ ├── sim_engine.py # Simulation logic, RL policy │ ├── state_store.py # In-memory state singleton │ └── websocket_manager.py # Connection pool, broadcast ├── routers/ │ ├── dashboard.py # GET /api/dashboard/summary │ ├── algorithms.py # GET /api/algorithms/compare │ ├── biometrics.py # GET /api/biometrics/history │ ├── session.py # GET /api/session/current │ ├── training.py # POST /api/training/{start,pause,reset} │ ├── reward.py # POST /api/reward/evaluate │ ├── logs.py # GET /api/logs/events │ └── health.py # GET /api/health └── schemas/ ├── dashboard.py # Pydantic models ├── reward.py └── training.py \\

Simulation Engine (sim_engine.py)

Core Loop (runs every 1.5s):

\\python def tick(): # 1. Update biometric signals (sine waves + noise) t = episode / 25.0 heart_rate = 78 + 10·sin(t) + noise(-4, 4) eeg_stress = 0.42 + 0.18·sin(t/2) + noise(-0.06, 0.06) accuracy = 0.82 + 0.08·sin(t/3) + noise(-0.03, 0.03)

# 2. Compute fatigue from biometrics
fatigue = 0.5·eeg_stress + 0.3·gaze_drift + 0.2·error_streak

# 3. Select action via policy
action = next_action()  # Rule-based policy (can be replaced with NN)

# 4. Compute reward
reward = reward_fn(alpha=0.5, beta=0.3, gamma=0.2, delta=0.4)

# 5. Track overload events
if heart_rate > 95 or eeg_stress > 0.75:
    overload_count += 1

# 6. Update difficulty based on action
if action.enemy_delta == "+5":
    difficulty += 0.2
elif action.enemy_delta == "-5":
    difficulty -= 0.2

# 7. Append to history buffers (keep last 60)
hr_history.append(heart_rate)
eeg_history.append(eeg_stress)
fatigue_history.append(fatigue)

# 8. Log and return snapshot
return store.snapshot()

\\

State Store (state_store.py)

In-memory singleton holding current state:

\\python class StateStore: episode: int # Training step counter reward: float # Current reward difficulty: float ∈ [0, 10] # Task difficulty level heart_rate: float # BPM eeg_stress: float ∈ [0, 1] # Normalized stress fatigue_score: float ∈ [0, 1] # Computed fatigue accuracy: float ∈ [0, 1] # Task accuracy reaction_time: float # Response latency (seconds) error_streak: int # Consecutive errors gaze_drift: float ∈ [0, 1] # Attention deviation current_action: dict # Last selected action training_running: bool # Session active flag overload_count: int # Total overload events last_tick: str (ISO) # Timestamp of last update

# History buffers (last 60 ticks)
hr_history: list[float]
eeg_history: list[float]
fatigue_history: list[float]
logs: list[str]                 # Event log (last 50)

\\

WebSocket Protocol

Connection: ws://localhost:8000/ws/live

Payload (sent every 1.5s):

\\json { "episode": 1247, "reward": 0.847, "difficulty": 6.5, "heart_rate": 74.0, "eeg_stress": 0.42, "fatigue_score": 0.28, "accuracy": 0.82, "reaction_time": 0.34, "error_streak": 2, "gaze_drift": 0.18, "training_running": true, "overload_count": 23, "last_tick": "2026-04-27T18:30:45.123456Z", "current_action": { "enemy_count_delta": "+5", "complexity": "HIGH", "hints": "NONE" } } \\

Frontend Architecture

Component Tree

\
Page (routing shell, 60 lines) ├── Sidebar (nav with sliding indicator) ├── Topbar (header + status badge) └── RouteContent (page router) ├── DashboardPage │ ├── KpiCard (x4) │ ├── FluidChartWrapper │ │ ├── LineChart (reward curves) │ │ └── RadarChart (state coverage) │ └── LiveSession card ├── AgentPage │ ├── LineChart (reward over time) │ ├── PieCharts (action distribution) │ └── Training controls ├── BiometricsPage │ ├── KpiCard (HR, EEG, Fatigue) │ ├── LineChart (multivariate signals) │ └── Stress event log ├── ComparePage │ ├── BarChart (algorithm metrics) │ └── LineChart (convergence curves) ├── MdpPage │ ├── MDP diagram (SVG) │ ├── Action space grid │ └── Reward function tuner └── ReportPage └── Research summary \\

State Management

No Redux/Context API — Simple prop drilling:

\\ ypescript Page (holds all state) ├── live: LivePayload (from WebSocket) ├── logs: string[] ├── series: SeriesPoint[] (chart data) ├── seconds: number (session timer) ├── wsStatus: WsStatus ├── weights: RewardWeights (MDP tuner) ├── rewardPreview: number └── selectedAction: ActionSelection

Passed down to page components as props \\

Animation System

Framer Motion used for:

KPI Cards: whileHover={{ y: -2 }} lift + glow
AnimatedNumber: Slide-in + flash on value change
GlowPulse: Pulse (live), breathing (reconnecting), shake (offline)
PageTransition: Fade + slide between routes
Sidebar: Sliding active indicator via layoutId
BiometricsPage: Color morph on stress label

Spring Config: \\ ypescript transition={{ duration: 0.35, ease: [0.34, 1.56, 0.64, 1] // cubic-bezier spring }} \\

WebSocket Client

\\ ypescript connectLiveSocket( onMessage: (data: LivePayload) => void, onStatusChange: (status: WsStatus) => void ): LiveSocket

// Exponential backoff reconnect: // Attempt 1: 1s delay // Attempt 2: 2s delay // Attempt 3: 4s delay // Attempt 4: 8s delay // Attempt 5: 16s delay // Attempt 6: 30s delay (capped) // After 6 attempts: offline \\

API Reference

Dashboard

\
GET /api/dashboard/summary

Response: { "training_episodes": 1247, "avg_reward": 0.847, "overload_events": 23, "policy_convergence": 94.2 } \\

Algorithms

\
GET /api/algorithms/compare

Response: { "metrics": { "ppo": { "final_reward": 0.847, "convergence_step": 420000, "stability": "High", "overload_events": 23 }, "a2c": { ... }, "sac": { ... } } } \\

Biometrics

\
GET /api/biometrics/history

Response: { "heart_rate": [72, 74, 73, ..., 75], "eeg_stress": [0.31, 0.35, 0.36, ..., 0.42], "fatigue_score": [0.18, 0.22, 0.20, ..., 0.28] } \\

Training Control

\
POST /api/training/start POST /api/training/pause POST /api/training/reset

Response: { "training_running": true/false, "message": "Training started/paused/reset" } \\

Reward Evaluation

\
POST /api/reward/evaluate

Request: { "alpha": 0.5, "beta": 0.3, "gamma": 0.2, "delta": 0.4, "accuracy_gain": 0.07, "stress_penalty": 0.42, "engagement_bonus": 0.25, "overload_flag": 0 }

Response: { "reward": 0.3245 } \\

Deployment

Docker (Backend)

\\dockerfile FROM python:3.10-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY backend/app ./app CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] \\

Docker Compose

\\yaml version: '3.8' services: backend: build: ./backend ports: - "8000:8000" environment: - CORS_ORIGINS=http://localhost:3000,https://myapp.vercel.app frontend: build: ./frontend ports: - "3000:3000" environment: - NEXT_PUBLIC_API_BASE_URL=http://backend:8000 - NEXT_PUBLIC_WS_URL=ws://backend:8000/ws/live \\

Environment Variables

Backend (.env): \
APP_NAME=AdaptRL Backend CORS_ORIGINS=http://localhost:3000 \\

Frontend (.env.local): \
NEXT_PUBLIC_API_BASE_URL=http://localhost:8000 NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws/live \\

Contributing

Code Style

Python: PEP 8, type hints required
TypeScript: ESLint + Prettier, strict mode
CSS: Custom properties, no frameworks

Testing

\\�ash

Backend unit tests

cd backend pytest tests/ -v

Frontend type check

cd frontend npx tsc --noEmit \\

Adding New Features

New RL algorithm: Implement in sim_engine.py, add to /api/algorithms/compare
New biometric signal: Add to StateStore, append to history buffer in ick()
New page: Create component in rontend/components/pages/, add route in RouteContent.tsx
New API endpoint: Add router in �ackend/app/routers/, include in main.py

References

RL Papers

PPO: Proximal Policy Optimization Algorithms
A2C: Asynchronous Methods for Deep Reinforcement Learning
SAC: Soft Actor-Critic Algorithms and Applications

VR Training

Adaptive Difficulty: Dynamic Difficulty Adjustment in Games
Biometric Feedback: Heart Rate Variability in Stress Assessment

Tech Stack

Backend: FastAPI, Pydantic, Python 3.10+
Frontend: Next.js 14, React 18, TypeScript, Framer Motion, Recharts
Real-time: WebSocket (native)
Deployment: Docker, Docker Compose

License

MIT License — See LICENSE file

Authors

AdaptRL Development Team
RL Lab Mini Project | 2025–26

Last Updated: April 27, 2026
Version: 1.0.0 (Complete Audit & Motion Design Upgrade)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
FEATURES.md		FEATURES.md
GIT_PUSH_GUIDE.md		GIT_PUSH_GUIDE.md
QUICK_START.md		QUICK_START.md
README.md		README.md
docker-compose.yml		docker-compose.yml
test_ws.py		test_ws.py

Folders and files

Latest commit

History

Repository files navigation

AdaptRL: Adaptive VR Training System via Reinforcement Learning

Table of Contents

Overview

Problem Statement

Key Features

Quick Start

Prerequisites

Backend Setup

Frontend Setup

Verify Installation

Architecture

System Overview

Data Flow

ext_action() selects action based on state

Reinforcement Learning Theory

What is Reinforcement Learning?

Key RL Concepts

Why PPO?

MDP Formulation

Markov Decision Process

State Space

Action Space

Reward Function

Policy Gradient Objective (PPO)

System Design

Backend Architecture

Simulation Engine (sim_engine.py)

State Store (state_store.py)

WebSocket Protocol

Frontend Architecture

Component Tree

State Management

Animation System

WebSocket Client

API Reference

Dashboard

Algorithms

Biometrics

Training Control

Reward Evaluation

Deployment

Docker (Backend)

Docker Compose

Environment Variables

Contributing

Code Style

Testing

Backend unit tests

Frontend type check

Adding New Features

References

RL Papers

VR Training

Tech Stack

License

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages