A real-time, biometric-driven RL dashboard for personalized VR training difficulty adjustment.
- Overview
- Quick Start
- Architecture
- Reinforcement Learning Theory
- MDP Formulation
- System Design
- API Reference
- Frontend Architecture
- Deployment
- Contributing
AdaptRL is an adaptive reinforcement learning system that dynamically adjusts VR training difficulty in real-time based on:
- Performance metrics: accuracy, reaction time, error streak
- Biometric signals: heart rate, EEG stress, gaze drift, fatigue
- RL policy: PPO (Proximal Policy Optimization) trained to maximize learning efficiency while preventing cognitive overload
Traditional VR training uses fixed difficulty levels, which causes:
- Underutilization: Bored trainees, no learning progression
- Cognitive overload: Stressed trainees, poor retention
- No personalization: One-size-fits-all approach fails for diverse learners
AdaptRL solves this by using RL to find the optimal difficulty trajectory for each trainee in real-time.
✅ Real-time Biometric Integration — Heart rate, EEG stress, gaze tracking
✅ RL-Driven Adaptation — PPO policy adjusts difficulty every 1.5s
✅ Live Dashboard — WebSocket-powered reactive UI with Framer Motion animations
✅ Algorithm Benchmarking — Compare PPO vs A2C vs SAC
✅ MDP Explorer — Interactive reward function tuning
✅ Stress Event Logging — Track overload events and recovery
- Python 3.10+ (backend)
- Node.js 18+ (frontend)
- pip and npm
\\�ash cd backend pip install -r requirements.txt export CORS_ORIGINS=http://localhost:3000 python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 \\
Backend runs on http://localhost:8000
WebSocket endpoint: ws://localhost:8000/ws/live
\\�ash cd frontend npm install echo 'NEXT_PUBLIC_API_BASE_URL=http://localhost:8000' > .env.local echo 'NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws/live' >> .env.local npm run dev \\
Frontend runs on http://localhost:3000
- Open http://localhost:3000 in browser
- You should see the dashboard with live KPI cards updating
- Check browser console for WebSocket connection logs
- Navigate to "RL Agent" page to see training logs
\
┌─────────────────────────────────────────────────────────────┐
│ VR Training Environment │
│ (Simulated: accuracy, HR, EEG, fatigue, reaction time) │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌────────────────────────┐
│ Simulation Engine │
│ (sim_engine.py) │
│ - tick() every 1.5s │
│ - next_action() │
│ - reward_fn() │
└────────────┬───────────┘
│
▼
┌────────────────────────┐
│ State Store │
│ (state_store.py) │
│ - In-memory singleton │
│ - Biometric buffers │
│ - Overload counter │
└────────────┬───────────┘
│
▼
┌────────────────────────┐
│ WebSocket Broadcast │
│ (manager.py) │
│ - JSON payload │
│ - 1.5s cadence │
└────────────┬───────────┘
│
▼
┌────────────────────────┐
│ Frontend Dashboard │
│ (React + Next.js) │
│ - Live KPI cards │
│ - Animated charts │
│ - Status indicators │
└────────────────────────┘
\\
- Simulation Tick (every 1.5s)
- sim_engine.tick() computes new state
- Biometric signals updated via sine waves + noise
eward_fn() computes reward signal
- Overload events tracked
-
State Snapshot
- store.snapshot() serializes current state
- Includes: episode, reward, difficulty, biometrics, action, timestamp
-
WebSocket Broadcast
- Payload sent to all connected clients
- Clients update live state via setLive()
- Charts and KPIs re-render with animations
-
Frontend Rendering
- Framer Motion animates value changes
- Charts interpolate smoothly
- Status indicators pulse/breathe based on connection state
RL is a machine learning paradigm where an agent learns to make decisions by interacting with an environment:
\
Agent observes State (s_t)
↓
Agent selects Action (a_t) based on policy π
↓
Environment transitions to new State (s_t+1)
↓
Environment provides Reward (r_t)
↓
Agent learns to maximize cumulative reward
\\
State (s): Complete description of the environment at time t
- In AdaptRL: accuracy, reaction_time, heart_rate, eeg_stress, error_streak, gaze_drift
Action (a): Decision the agent makes
- In AdaptRL: enemy_count_delta, complexity, hints
Reward (r): Scalar feedback signal
- In AdaptRL: weighted combination of performance gain and stress penalty
Policy (π): Mapping from states to actions
- In AdaptRL: PPO neural network trained to maximize cumulative reward
Value Function (V): Expected cumulative future reward from state s
- Used by PPO to estimate advantage of each action
Advantage (A): How much better an action is vs. the average
- A(s,a) = Q(s,a) - V(s)
- PPO uses this to update policy
PPO (Proximal Policy Optimization) is chosen because:
| Metric | PPO | A2C | SAC |
|---|---|---|---|
| Discrete Actions | ✅ Best | OK | Continuous-leaning |
| Stability | ✅ High | Medium | High |
| Sample Efficiency | Medium | Low | ✅ High |
| Convergence Speed | ✅ Fast | Slow | Medium |
| Implementation | ✅ Simple | Complex | Complex |
| Unity ML-Agents | ✅ Native | Good | OK |
PPO Key Features:
- Clipped objective: Prevents policy from changing too much per update
- Entropy bonus: Encourages exploration
- Generalized Advantage Estimation (GAE): Reduces variance in advantage estimates
- Multiple epochs: Reuses data efficiently
AdaptRL is formulated as a finite-horizon MDP:
\
MDP = (S, A, P, R, γ, T)
S = State space (continuous) A = Action space (discrete) P = Transition dynamics (deterministic in simulation) R = Reward function γ = Discount factor (0.99) T = Episode horizon (500k steps) \\
Dimension: 6-dimensional continuous space
\\python State = { accuracy: float ∈ [0.60, 0.98], # Task performance reaction_time: float ∈ [0.20, 0.60], # Response speed (seconds) heart_rate: float ∈ [68, 110], # BPM eeg_stress: float ∈ [0.15, 0.95], # Normalized stress index error_streak: int ∈ [0, 3], # Consecutive errors gaze_drift: float ∈ [0.05, 0.50], # Attention deviation } \\
Normalization: All continuous values normalized to [0, 1] for neural network input.
Dimension: 3 discrete factors (27 total actions)
\\python Action = { enemy_count_delta: {"-5", "0", "+5"}, # Difficulty adjustment complexity: {"LOW", "MED", "HIGH"}, # Task complexity hints: {"NONE", "AUDIO", "VISUAL"}, # Assistance type } \\
Semantics:
- enemy_count_delta: Adjust number of enemies (difficulty)
- complexity: Change task complexity level
- hints: Provide audio/visual assistance to reduce stress
\\python R(s, a) = α·ΔAccuracy - β·StressPenalty + γ·EngagementBonus - δ·OverloadFlag
where: α = 0.5 (accuracy weight) β = 0.3 (stress weight) γ = 0.2 (engagement weight) δ = 0.4 (overload weight)
ΔAccuracy = max(0, accuracy - 0.75) StressPenalty = eeg_stress EngagementBonus = 0.25 if difficulty ≥ 5 else 0.1 OverloadFlag = 1 if (heart_rate > 95 OR eeg_stress > 0.75) else 0 \\
Interpretation:
- Maximize accuracy gain (α term): Encourage learning
- Minimize stress (β term): Prevent cognitive overload
- Reward engagement (γ term): Maintain challenge
- Penalize overload (δ term): Detect and prevent burnout
\
L^CLIP(θ) = Ê_t [ min(r_t(θ)·Â_t, clip(r_t(θ), 1-ε, 1+ε)·Â_t) ]
where: r_t(θ) = π_θ(a_t|s_t) / π_θ_old(a_t|s_t) (probability ratio) Â_t = generalized advantage estimate ε = 0.2 (clipping range) \\
Intuition: PPO clips the probability ratio to prevent large policy updates that could destabilize training.
\
backend/app/
├── main.py # FastAPI app, lifespan, WebSocket
├── core/
│ └── config.py # Pydantic settings, CORS config
├── services/
│ ├── sim_engine.py # Simulation logic, RL policy
│ ├── state_store.py # In-memory state singleton
│ └── websocket_manager.py # Connection pool, broadcast
├── routers/
│ ├── dashboard.py # GET /api/dashboard/summary
│ ├── algorithms.py # GET /api/algorithms/compare
│ ├── biometrics.py # GET /api/biometrics/history
│ ├── session.py # GET /api/session/current
│ ├── training.py # POST /api/training/{start,pause,reset}
│ ├── reward.py # POST /api/reward/evaluate
│ ├── logs.py # GET /api/logs/events
│ └── health.py # GET /api/health
└── schemas/
├── dashboard.py # Pydantic models
├── reward.py
└── training.py
\\
Core Loop (runs every 1.5s):
\\python def tick(): # 1. Update biometric signals (sine waves + noise) t = episode / 25.0 heart_rate = 78 + 10·sin(t) + noise(-4, 4) eeg_stress = 0.42 + 0.18·sin(t/2) + noise(-0.06, 0.06) accuracy = 0.82 + 0.08·sin(t/3) + noise(-0.03, 0.03)
# 2. Compute fatigue from biometrics
fatigue = 0.5·eeg_stress + 0.3·gaze_drift + 0.2·error_streak
# 3. Select action via policy
action = next_action() # Rule-based policy (can be replaced with NN)
# 4. Compute reward
reward = reward_fn(alpha=0.5, beta=0.3, gamma=0.2, delta=0.4)
# 5. Track overload events
if heart_rate > 95 or eeg_stress > 0.75:
overload_count += 1
# 6. Update difficulty based on action
if action.enemy_delta == "+5":
difficulty += 0.2
elif action.enemy_delta == "-5":
difficulty -= 0.2
# 7. Append to history buffers (keep last 60)
hr_history.append(heart_rate)
eeg_history.append(eeg_stress)
fatigue_history.append(fatigue)
# 8. Log and return snapshot
return store.snapshot()
\\
In-memory singleton holding current state:
\\python class StateStore: episode: int # Training step counter reward: float # Current reward difficulty: float ∈ [0, 10] # Task difficulty level heart_rate: float # BPM eeg_stress: float ∈ [0, 1] # Normalized stress fatigue_score: float ∈ [0, 1] # Computed fatigue accuracy: float ∈ [0, 1] # Task accuracy reaction_time: float # Response latency (seconds) error_streak: int # Consecutive errors gaze_drift: float ∈ [0, 1] # Attention deviation current_action: dict # Last selected action training_running: bool # Session active flag overload_count: int # Total overload events last_tick: str (ISO) # Timestamp of last update
# History buffers (last 60 ticks)
hr_history: list[float]
eeg_history: list[float]
fatigue_history: list[float]
logs: list[str] # Event log (last 50)
\\
Connection: ws://localhost:8000/ws/live
Payload (sent every 1.5s):
\\json { "episode": 1247, "reward": 0.847, "difficulty": 6.5, "heart_rate": 74.0, "eeg_stress": 0.42, "fatigue_score": 0.28, "accuracy": 0.82, "reaction_time": 0.34, "error_streak": 2, "gaze_drift": 0.18, "training_running": true, "overload_count": 23, "last_tick": "2026-04-27T18:30:45.123456Z", "current_action": { "enemy_count_delta": "+5", "complexity": "HIGH", "hints": "NONE" } } \\
\
Page (routing shell, 60 lines)
├── Sidebar (nav with sliding indicator)
├── Topbar (header + status badge)
└── RouteContent (page router)
├── DashboardPage
│ ├── KpiCard (x4)
│ ├── FluidChartWrapper
│ │ ├── LineChart (reward curves)
│ │ └── RadarChart (state coverage)
│ └── LiveSession card
├── AgentPage
│ ├── LineChart (reward over time)
│ ├── PieCharts (action distribution)
│ └── Training controls
├── BiometricsPage
│ ├── KpiCard (HR, EEG, Fatigue)
│ ├── LineChart (multivariate signals)
│ └── Stress event log
├── ComparePage
│ ├── BarChart (algorithm metrics)
│ └── LineChart (convergence curves)
├── MdpPage
│ ├── MDP diagram (SVG)
│ ├── Action space grid
│ └── Reward function tuner
└── ReportPage
└── Research summary
\\
No Redux/Context API — Simple prop drilling:
\\ ypescript Page (holds all state) ├── live: LivePayload (from WebSocket) ├── logs: string[] ├── series: SeriesPoint[] (chart data) ├── seconds: number (session timer) ├── wsStatus: WsStatus ├── weights: RewardWeights (MDP tuner) ├── rewardPreview: number └── selectedAction: ActionSelection
Passed down to page components as props \\
Framer Motion used for:
- KPI Cards: whileHover={{ y: -2 }} lift + glow
- AnimatedNumber: Slide-in + flash on value change
- GlowPulse: Pulse (live), breathing (reconnecting), shake (offline)
- PageTransition: Fade + slide between routes
- Sidebar: Sliding active indicator via layoutId
- BiometricsPage: Color morph on stress label
Spring Config: \\ ypescript transition={{ duration: 0.35, ease: [0.34, 1.56, 0.64, 1] // cubic-bezier spring }} \\
\\ ypescript connectLiveSocket( onMessage: (data: LivePayload) => void, onStatusChange: (status: WsStatus) => void ): LiveSocket
// Exponential backoff reconnect: // Attempt 1: 1s delay // Attempt 2: 2s delay // Attempt 3: 4s delay // Attempt 4: 8s delay // Attempt 5: 16s delay // Attempt 6: 30s delay (capped) // After 6 attempts: offline \\
\
GET /api/dashboard/summary
Response: { "training_episodes": 1247, "avg_reward": 0.847, "overload_events": 23, "policy_convergence": 94.2 } \\
\
GET /api/algorithms/compare
Response: { "metrics": { "ppo": { "final_reward": 0.847, "convergence_step": 420000, "stability": "High", "overload_events": 23 }, "a2c": { ... }, "sac": { ... } } } \\
\
GET /api/biometrics/history
Response: { "heart_rate": [72, 74, 73, ..., 75], "eeg_stress": [0.31, 0.35, 0.36, ..., 0.42], "fatigue_score": [0.18, 0.22, 0.20, ..., 0.28] } \\
\
POST /api/training/start
POST /api/training/pause
POST /api/training/reset
Response: { "training_running": true/false, "message": "Training started/paused/reset" } \\
\
POST /api/reward/evaluate
Request: { "alpha": 0.5, "beta": 0.3, "gamma": 0.2, "delta": 0.4, "accuracy_gain": 0.07, "stress_penalty": 0.42, "engagement_bonus": 0.25, "overload_flag": 0 }
Response: { "reward": 0.3245 } \\
\\dockerfile FROM python:3.10-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY backend/app ./app CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] \\
\\yaml version: '3.8' services: backend: build: ./backend ports: - "8000:8000" environment: - CORS_ORIGINS=http://localhost:3000,https://myapp.vercel.app frontend: build: ./frontend ports: - "3000:3000" environment: - NEXT_PUBLIC_API_BASE_URL=http://backend:8000 - NEXT_PUBLIC_WS_URL=ws://backend:8000/ws/live \\
Backend (.env):
\
APP_NAME=AdaptRL Backend
CORS_ORIGINS=http://localhost:3000
\\
Frontend (.env.local):
\
NEXT_PUBLIC_API_BASE_URL=http://localhost:8000
NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws/live
\\
- Python: PEP 8, type hints required
- TypeScript: ESLint + Prettier, strict mode
- CSS: Custom properties, no frameworks
\\�ash
cd backend pytest tests/ -v
cd frontend npx tsc --noEmit \\
- New RL algorithm: Implement in sim_engine.py, add to /api/algorithms/compare
- New biometric signal: Add to StateStore, append to history buffer in ick()
- New page: Create component in rontend/components/pages/, add route in RouteContent.tsx
- New API endpoint: Add router in �ackend/app/routers/, include in main.py
- PPO: Proximal Policy Optimization Algorithms
- A2C: Asynchronous Methods for Deep Reinforcement Learning
- SAC: Soft Actor-Critic Algorithms and Applications
- Adaptive Difficulty: Dynamic Difficulty Adjustment in Games
- Biometric Feedback: Heart Rate Variability in Stress Assessment
- Backend: FastAPI, Pydantic, Python 3.10+
- Frontend: Next.js 14, React 18, TypeScript, Framer Motion, Recharts
- Real-time: WebSocket (native)
- Deployment: Docker, Docker Compose
MIT License — See LICENSE file
AdaptRL Development Team
RL Lab Mini Project | 2025–26
Last Updated: April 27, 2026
Version: 1.0.0 (Complete Audit & Motion Design Upgrade)