Skip to content

Arjun-Reddy-I/AdaptRL

Repository files navigation

AdaptRL: Adaptive VR Training System via Reinforcement Learning

A real-time, biometric-driven RL dashboard for personalized VR training difficulty adjustment.


Table of Contents

  1. Overview
  2. Quick Start
  3. Architecture
  4. Reinforcement Learning Theory
  5. MDP Formulation
  6. System Design
  7. API Reference
  8. Frontend Architecture
  9. Deployment
  10. Contributing

Overview

AdaptRL is an adaptive reinforcement learning system that dynamically adjusts VR training difficulty in real-time based on:

  • Performance metrics: accuracy, reaction time, error streak
  • Biometric signals: heart rate, EEG stress, gaze drift, fatigue
  • RL policy: PPO (Proximal Policy Optimization) trained to maximize learning efficiency while preventing cognitive overload

Problem Statement

Traditional VR training uses fixed difficulty levels, which causes:

  • Underutilization: Bored trainees, no learning progression
  • Cognitive overload: Stressed trainees, poor retention
  • No personalization: One-size-fits-all approach fails for diverse learners

AdaptRL solves this by using RL to find the optimal difficulty trajectory for each trainee in real-time.

Key Features

Real-time Biometric Integration — Heart rate, EEG stress, gaze tracking
RL-Driven Adaptation — PPO policy adjusts difficulty every 1.5s
Live Dashboard — WebSocket-powered reactive UI with Framer Motion animations
Algorithm Benchmarking — Compare PPO vs A2C vs SAC
MDP Explorer — Interactive reward function tuning
Stress Event Logging — Track overload events and recovery


Quick Start

Prerequisites

  • Python 3.10+ (backend)
  • Node.js 18+ (frontend)
  • pip and npm

Backend Setup

\\�ash cd backend pip install -r requirements.txt export CORS_ORIGINS=http://localhost:3000 python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 \\

Backend runs on http://localhost:8000
WebSocket endpoint: ws://localhost:8000/ws/live

Frontend Setup

\\�ash cd frontend npm install echo 'NEXT_PUBLIC_API_BASE_URL=http://localhost:8000' > .env.local echo 'NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws/live' >> .env.local npm run dev \\

Frontend runs on http://localhost:3000

Verify Installation

  1. Open http://localhost:3000 in browser
  2. You should see the dashboard with live KPI cards updating
  3. Check browser console for WebSocket connection logs
  4. Navigate to "RL Agent" page to see training logs

Architecture

System Overview

\
┌─────────────────────────────────────────────────────────────┐ │ VR Training Environment │ │ (Simulated: accuracy, HR, EEG, fatigue, reaction time) │ └────────────────────┬────────────────────────────────────────┘ │ ▼ ┌────────────────────────┐ │ Simulation Engine │ │ (sim_engine.py) │ │ - tick() every 1.5s │ │ - next_action() │ │ - reward_fn() │ └────────────┬───────────┘ │ ▼ ┌────────────────────────┐ │ State Store │ │ (state_store.py) │ │ - In-memory singleton │ │ - Biometric buffers │ │ - Overload counter │ └────────────┬───────────┘ │ ▼ ┌────────────────────────┐ │ WebSocket Broadcast │ │ (manager.py) │ │ - JSON payload │ │ - 1.5s cadence │ └────────────┬───────────┘ │ ▼ ┌────────────────────────┐ │ Frontend Dashboard │ │ (React + Next.js) │ │ - Live KPI cards │ │ - Animated charts │ │ - Status indicators │ └────────────────────────┘ \\

Data Flow

  1. Simulation Tick (every 1.5s)
    • sim_engine.tick() computes new state
    • Biometric signals updated via sine waves + noise

ext_action() selects action based on state

eward_fn() computes reward signal

  • Overload events tracked
  1. State Snapshot

    • store.snapshot() serializes current state
    • Includes: episode, reward, difficulty, biometrics, action, timestamp
  2. WebSocket Broadcast

    • Payload sent to all connected clients
    • Clients update live state via setLive()
    • Charts and KPIs re-render with animations
  3. Frontend Rendering

    • Framer Motion animates value changes
    • Charts interpolate smoothly
    • Status indicators pulse/breathe based on connection state

Reinforcement Learning Theory

What is Reinforcement Learning?

RL is a machine learning paradigm where an agent learns to make decisions by interacting with an environment:

\
Agent observes State (s_t) ↓ Agent selects Action (a_t) based on policy π ↓ Environment transitions to new State (s_t+1) ↓ Environment provides Reward (r_t) ↓ Agent learns to maximize cumulative reward \\

Key RL Concepts

State (s): Complete description of the environment at time t

  • In AdaptRL: accuracy, reaction_time, heart_rate, eeg_stress, error_streak, gaze_drift

Action (a): Decision the agent makes

  • In AdaptRL: enemy_count_delta, complexity, hints

Reward (r): Scalar feedback signal

  • In AdaptRL: weighted combination of performance gain and stress penalty

Policy (π): Mapping from states to actions

  • In AdaptRL: PPO neural network trained to maximize cumulative reward

Value Function (V): Expected cumulative future reward from state s

  • Used by PPO to estimate advantage of each action

Advantage (A): How much better an action is vs. the average

  • A(s,a) = Q(s,a) - V(s)
  • PPO uses this to update policy

Why PPO?

PPO (Proximal Policy Optimization) is chosen because:

Metric PPO A2C SAC
Discrete Actions ✅ Best OK Continuous-leaning
Stability ✅ High Medium High
Sample Efficiency Medium Low ✅ High
Convergence Speed ✅ Fast Slow Medium
Implementation ✅ Simple Complex Complex
Unity ML-Agents ✅ Native Good OK

PPO Key Features:

  • Clipped objective: Prevents policy from changing too much per update
  • Entropy bonus: Encourages exploration
  • Generalized Advantage Estimation (GAE): Reduces variance in advantage estimates
  • Multiple epochs: Reuses data efficiently

MDP Formulation

Markov Decision Process

AdaptRL is formulated as a finite-horizon MDP:

\
MDP = (S, A, P, R, γ, T)

S = State space (continuous) A = Action space (discrete) P = Transition dynamics (deterministic in simulation) R = Reward function γ = Discount factor (0.99) T = Episode horizon (500k steps) \\

State Space

Dimension: 6-dimensional continuous space

\\python State = { accuracy: float ∈ [0.60, 0.98], # Task performance reaction_time: float ∈ [0.20, 0.60], # Response speed (seconds) heart_rate: float ∈ [68, 110], # BPM eeg_stress: float ∈ [0.15, 0.95], # Normalized stress index error_streak: int ∈ [0, 3], # Consecutive errors gaze_drift: float ∈ [0.05, 0.50], # Attention deviation } \\

Normalization: All continuous values normalized to [0, 1] for neural network input.

Action Space

Dimension: 3 discrete factors (27 total actions)

\\python Action = { enemy_count_delta: {"-5", "0", "+5"}, # Difficulty adjustment complexity: {"LOW", "MED", "HIGH"}, # Task complexity hints: {"NONE", "AUDIO", "VISUAL"}, # Assistance type } \\

Semantics:

  • enemy_count_delta: Adjust number of enemies (difficulty)
  • complexity: Change task complexity level
  • hints: Provide audio/visual assistance to reduce stress

Reward Function

\\python R(s, a) = α·ΔAccuracy - β·StressPenalty + γ·EngagementBonus - δ·OverloadFlag

where: α = 0.5 (accuracy weight) β = 0.3 (stress weight) γ = 0.2 (engagement weight) δ = 0.4 (overload weight)

ΔAccuracy = max(0, accuracy - 0.75) StressPenalty = eeg_stress EngagementBonus = 0.25 if difficulty ≥ 5 else 0.1 OverloadFlag = 1 if (heart_rate > 95 OR eeg_stress > 0.75) else 0 \\

Interpretation:

  • Maximize accuracy gain (α term): Encourage learning
  • Minimize stress (β term): Prevent cognitive overload
  • Reward engagement (γ term): Maintain challenge
  • Penalize overload (δ term): Detect and prevent burnout

Policy Gradient Objective (PPO)

\
L^CLIP(θ) = Ê_t [ min(r_t(θ)·Â_t, clip(r_t(θ), 1-ε, 1+ε)·Â_t) ]

where: r_t(θ) = π_θ(a_t|s_t) / π_θ_old(a_t|s_t) (probability ratio) Â_t = generalized advantage estimate ε = 0.2 (clipping range) \\

Intuition: PPO clips the probability ratio to prevent large policy updates that could destabilize training.


System Design

Backend Architecture

\
backend/app/ ├── main.py # FastAPI app, lifespan, WebSocket ├── core/ │ └── config.py # Pydantic settings, CORS config ├── services/ │ ├── sim_engine.py # Simulation logic, RL policy │ ├── state_store.py # In-memory state singleton │ └── websocket_manager.py # Connection pool, broadcast ├── routers/ │ ├── dashboard.py # GET /api/dashboard/summary │ ├── algorithms.py # GET /api/algorithms/compare │ ├── biometrics.py # GET /api/biometrics/history │ ├── session.py # GET /api/session/current │ ├── training.py # POST /api/training/{start,pause,reset} │ ├── reward.py # POST /api/reward/evaluate │ ├── logs.py # GET /api/logs/events │ └── health.py # GET /api/health └── schemas/ ├── dashboard.py # Pydantic models ├── reward.py └── training.py \\

Simulation Engine (sim_engine.py)

Core Loop (runs every 1.5s):

\\python def tick(): # 1. Update biometric signals (sine waves + noise) t = episode / 25.0 heart_rate = 78 + 10·sin(t) + noise(-4, 4) eeg_stress = 0.42 + 0.18·sin(t/2) + noise(-0.06, 0.06) accuracy = 0.82 + 0.08·sin(t/3) + noise(-0.03, 0.03)

# 2. Compute fatigue from biometrics
fatigue = 0.5·eeg_stress + 0.3·gaze_drift + 0.2·error_streak

# 3. Select action via policy
action = next_action()  # Rule-based policy (can be replaced with NN)

# 4. Compute reward
reward = reward_fn(alpha=0.5, beta=0.3, gamma=0.2, delta=0.4)

# 5. Track overload events
if heart_rate > 95 or eeg_stress > 0.75:
    overload_count += 1

# 6. Update difficulty based on action
if action.enemy_delta == "+5":
    difficulty += 0.2
elif action.enemy_delta == "-5":
    difficulty -= 0.2

# 7. Append to history buffers (keep last 60)
hr_history.append(heart_rate)
eeg_history.append(eeg_stress)
fatigue_history.append(fatigue)

# 8. Log and return snapshot
return store.snapshot()

\\

State Store (state_store.py)

In-memory singleton holding current state:

\\python class StateStore: episode: int # Training step counter reward: float # Current reward difficulty: float ∈ [0, 10] # Task difficulty level heart_rate: float # BPM eeg_stress: float ∈ [0, 1] # Normalized stress fatigue_score: float ∈ [0, 1] # Computed fatigue accuracy: float ∈ [0, 1] # Task accuracy reaction_time: float # Response latency (seconds) error_streak: int # Consecutive errors gaze_drift: float ∈ [0, 1] # Attention deviation current_action: dict # Last selected action training_running: bool # Session active flag overload_count: int # Total overload events last_tick: str (ISO) # Timestamp of last update

# History buffers (last 60 ticks)
hr_history: list[float]
eeg_history: list[float]
fatigue_history: list[float]
logs: list[str]                 # Event log (last 50)

\\

WebSocket Protocol

Connection: ws://localhost:8000/ws/live

Payload (sent every 1.5s):

\\json { "episode": 1247, "reward": 0.847, "difficulty": 6.5, "heart_rate": 74.0, "eeg_stress": 0.42, "fatigue_score": 0.28, "accuracy": 0.82, "reaction_time": 0.34, "error_streak": 2, "gaze_drift": 0.18, "training_running": true, "overload_count": 23, "last_tick": "2026-04-27T18:30:45.123456Z", "current_action": { "enemy_count_delta": "+5", "complexity": "HIGH", "hints": "NONE" } } \\


Frontend Architecture

Component Tree

\
Page (routing shell, 60 lines) ├── Sidebar (nav with sliding indicator) ├── Topbar (header + status badge) └── RouteContent (page router) ├── DashboardPage │ ├── KpiCard (x4) │ ├── FluidChartWrapper │ │ ├── LineChart (reward curves) │ │ └── RadarChart (state coverage) │ └── LiveSession card ├── AgentPage │ ├── LineChart (reward over time) │ ├── PieCharts (action distribution) │ └── Training controls ├── BiometricsPage │ ├── KpiCard (HR, EEG, Fatigue) │ ├── LineChart (multivariate signals) │ └── Stress event log ├── ComparePage │ ├── BarChart (algorithm metrics) │ └── LineChart (convergence curves) ├── MdpPage │ ├── MDP diagram (SVG) │ ├── Action space grid │ └── Reward function tuner └── ReportPage └── Research summary \\

State Management

No Redux/Context API — Simple prop drilling:

\\ ypescript Page (holds all state) ├── live: LivePayload (from WebSocket) ├── logs: string[] ├── series: SeriesPoint[] (chart data) ├── seconds: number (session timer) ├── wsStatus: WsStatus ├── weights: RewardWeights (MDP tuner) ├── rewardPreview: number └── selectedAction: ActionSelection

Passed down to page components as props \\

Animation System

Framer Motion used for:

  1. KPI Cards: whileHover={{ y: -2 }} lift + glow
  2. AnimatedNumber: Slide-in + flash on value change
  3. GlowPulse: Pulse (live), breathing (reconnecting), shake (offline)
  4. PageTransition: Fade + slide between routes
  5. Sidebar: Sliding active indicator via layoutId
  6. BiometricsPage: Color morph on stress label

Spring Config: \\ ypescript transition={{ duration: 0.35, ease: [0.34, 1.56, 0.64, 1] // cubic-bezier spring }} \\

WebSocket Client

\\ ypescript connectLiveSocket( onMessage: (data: LivePayload) => void, onStatusChange: (status: WsStatus) => void ): LiveSocket

// Exponential backoff reconnect: // Attempt 1: 1s delay // Attempt 2: 2s delay // Attempt 3: 4s delay // Attempt 4: 8s delay // Attempt 5: 16s delay // Attempt 6: 30s delay (capped) // After 6 attempts: offline \\


API Reference

Dashboard

\
GET /api/dashboard/summary

Response: { "training_episodes": 1247, "avg_reward": 0.847, "overload_events": 23, "policy_convergence": 94.2 } \\

Algorithms

\
GET /api/algorithms/compare

Response: { "metrics": { "ppo": { "final_reward": 0.847, "convergence_step": 420000, "stability": "High", "overload_events": 23 }, "a2c": { ... }, "sac": { ... } } } \\

Biometrics

\
GET /api/biometrics/history

Response: { "heart_rate": [72, 74, 73, ..., 75], "eeg_stress": [0.31, 0.35, 0.36, ..., 0.42], "fatigue_score": [0.18, 0.22, 0.20, ..., 0.28] } \\

Training Control

\
POST /api/training/start POST /api/training/pause POST /api/training/reset

Response: { "training_running": true/false, "message": "Training started/paused/reset" } \\

Reward Evaluation

\
POST /api/reward/evaluate

Request: { "alpha": 0.5, "beta": 0.3, "gamma": 0.2, "delta": 0.4, "accuracy_gain": 0.07, "stress_penalty": 0.42, "engagement_bonus": 0.25, "overload_flag": 0 }

Response: { "reward": 0.3245 } \\


Deployment

Docker (Backend)

\\dockerfile FROM python:3.10-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY backend/app ./app CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] \\

Docker Compose

\\yaml version: '3.8' services: backend: build: ./backend ports: - "8000:8000" environment: - CORS_ORIGINS=http://localhost:3000,https://myapp.vercel.app frontend: build: ./frontend ports: - "3000:3000" environment: - NEXT_PUBLIC_API_BASE_URL=http://backend:8000 - NEXT_PUBLIC_WS_URL=ws://backend:8000/ws/live \\

Environment Variables

Backend (.env): \
APP_NAME=AdaptRL Backend CORS_ORIGINS=http://localhost:3000 \\

Frontend (.env.local): \
NEXT_PUBLIC_API_BASE_URL=http://localhost:8000 NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws/live \\


Contributing

Code Style

  • Python: PEP 8, type hints required
  • TypeScript: ESLint + Prettier, strict mode
  • CSS: Custom properties, no frameworks

Testing

\\�ash

Backend unit tests

cd backend pytest tests/ -v

Frontend type check

cd frontend npx tsc --noEmit \\

Adding New Features

  1. New RL algorithm: Implement in sim_engine.py, add to /api/algorithms/compare
  2. New biometric signal: Add to StateStore, append to history buffer in ick()
  3. New page: Create component in rontend/components/pages/, add route in RouteContent.tsx
  4. New API endpoint: Add router in �ackend/app/routers/, include in main.py

References

RL Papers

VR Training

Tech Stack

  • Backend: FastAPI, Pydantic, Python 3.10+
  • Frontend: Next.js 14, React 18, TypeScript, Framer Motion, Recharts
  • Real-time: WebSocket (native)
  • Deployment: Docker, Docker Compose

License

MIT License — See LICENSE file

Authors

AdaptRL Development Team
RL Lab Mini Project | 2025–26


Last Updated: April 27, 2026
Version: 1.0.0 (Complete Audit & Motion Design Upgrade)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors