Screencast MCP

An MCP server that lets an agent record the screen, watch the footage, and make minimal ffmpeg edits — over stdio, on Windows.

Capture the desktop, a monitor, a window, or a region; sample a recording into frames the agent can actually look at; trim, crop, scale, overlay, compress, redact, and convert. No cloud, no streaming — ffmpeg and ffprobe wrapped behind nineteen typed tools.

Note

Screen capture uses gdigrab and is Windows-only; the watch, edit, and produce tools work anywhere ffmpeg runs. The full pipeline is in place: capture, watch, the edit surface, redact_region safety redaction, system-audio capture, and the produce layer (transitions, title cards, music bed, aspect variants, platform export). Cross-platform capture backends are next — see ROADMAP.md.

Overview

Screencast MCP gives an agent a screen recorder it can drive and reason about. It speaks Model Context Protocol over stdio and exposes ffmpeg as a small, typed tool surface rather than a flag soup. The defining capability is the watch loop: an agent records footage, samples it into PNG frames, and views those frames to confirm what actually happened on screen.

The design choices are deliberate rather than incidental:

Capture is always explicit. A recording or screenshot happens only on a tool call — nothing auto-fires and there is no background or scheduled capture.
Footage is made viewable. sample_frames turns a video into images an agent can open, so "watch what happened" is a first-class operation, not an afterthought.
Presets over raw flags. Quality is draft / standard / high; the agent never reasons about codecs, CRF, or pixel formats.
Safe by default. Output lands under SCREENCAST_HOME, never inside a project checkout, and the public repo's .gitignore blocks captured media from being committed.
Crash-safe sessions. A recording interrupted by a crash is reconciled on the next start (orphan reaping), so no ffmpeg child silently outlives the server.

Tools

Twenty-five tools across four concerns. The manifest in mcp-tools.json is the canonical surface and is kept in sync with src/tools/.

Capture

Tool	Purpose
`start_recording`	Start a background recording. `target` = `full` \| `monitor:<index>` \| `window:<title>` \| `region:<x>,<y>,<w>,<h>`; optional `fps`, `quality`, and `audio` (set `audio.source` = `system` to also capture loopback audio). Returns a session id and output path.
`stop_recording`	Stop a session by id. Sends ffmpeg a graceful quit so the file is finalized, not truncated. Returns the final path and duration.
`list_sessions`	List active and finished sessions.
`get_session`	Inspect a single session by id.
`screenshot`	Capture a single PNG of a `target`.
`list_audio_devices`	List the DirectShow audio devices ffmpeg can see and flag a likely system-audio loopback device for `start_recording`.

Watch

Tool	Purpose
`sample_frames`	Extract frames from a video — at a fixed `fps` or at explicit `timestamps` — so the agent can view what happened. Returns the frame paths.
`get_media_info`	ffprobe wrapper: duration, resolution, fps, codecs, container format, and size.

Minimal edit

Tool	Purpose
`trim`	Cut a sub-clip by `start` + (`end` or `duration`). Stream-copy for speed.
`concat`	Join two or more videos into one.
`convert`	Convert between `mp4`, `gif`, and `webm`.

Edit surface

These tools re-encode (a filter rewrites pixels, so stream copy does not apply). They reuse the same draft/standard/high presets as capture.

Tool	Purpose
`crop`	Crop to a pixel rectangle (`x`, `y`, `width`, `height`). A rectangle that runs off the frame is rejected, not silently clamped.
`scale`	Resize to a `width` and/or `height`. One side keeps the aspect ratio; both set an exact size.
`speed`	Change playback speed by a `factor` (>1 faster, <1 slower). Audio is retempo'd when present.
`overlay`	Composite a logo, watermark, or picture-in-picture onto a base video at a position, optionally scaled and time-limited.
`compress`	Re-encode smaller with a `light`/`medium`/`heavy` CRF ladder and an optional `maxWidth` that only downscales.
`extract_audio`	Write the audio track to its own file (`mp3`, `aac`, `wav`, or `copy`).
`clip`	Extract one or more frame-accurate sub-segments to separate files. Unlike `trim`, it re-encodes so cuts land exactly on the given times.

Redact

Tool	Purpose
`redact_region`	Cover declared rectangles in a video. `style` is `box` (default, a solid irreversible fill), `blur`, or `pixelate`; each region may be limited to a `start`/`end` window and expanded with `pad`.

Important

redact_region covers the regions you declare. It is not automatic secret detection, so it cannot find a secret you did not point it at. The default box style is a solid fill, which is irreversible; blur and pixelate are softer but can be partially recovered, so prefer box for real secrets. A region that falls outside the frame is rejected rather than silently doing nothing.

Produce

The production layer turns raw clips into a finished piece. Tools that combine clips auto-normalize each input to a common resolution, fps, and audio rate first, so heterogeneous sources compose cleanly.

Tool	Purpose
`xfade_transition`	Crossfade two videos into one with an `xfade` transition (`fade`, `wipeleft`, `slideup`, ...). Audio is crossfaded when both clips have a track.
`assemble_highlights`	Stitch two or more clips into one, with hard cuts (`transition: "cut"`) or an xfade transition between each.
`title_card`	Generate a standalone title card (centered text on a solid background, with a silent track so it composes). Text uses a bundled font, so no system font is needed.
`music_bed`	Lay a music track under a video: looped/trimmed to length, faded in and out, leveled, and mixed with any existing audio (optionally ducked).
`reframe`	Re-aspect to `16:9`, `9:16`, `1:1`, or `4:5` with `pad` (letterbox, no content lost) or `crop` (fill, trims overflow).
`export_preset`	Encode a platform-ready file (`youtube`, `instagram_reel`, `tiktok`, `x`, `square`): reframes, caps fps, and encodes H.264 at the platform bitrate with faststart.

Targets

Every capture tool takes a single target string, so an agent never has to juggle quoting:

Target	Captures
`full`	The whole virtual desktop
`monitor:<index>`	One display; `0` is always primary
`window:<title>`	The on-screen rectangle a window occupies (case-insensitive exact title, else substring; topmost wins)
`region:<x>,<y>,<w>,<h>`	An absolute pixel rectangle

Output is written under SCREENCAST_HOME (default <homedir>/.screencast-mcp) into recordings/, frames/, screenshots/, and edits/. Any tool also accepts an explicit output path.

Prerequisites

ffmpeg and ffprobe are external dependencies and must be on PATH (or pointed at via the FFMPEG_PATH / FFPROBE_PATH environment variables). The server detects them per call and returns a clear error with an install hint if either is missing.

Platform	Install
Windows	`winget install Gyan.FFmpeg` or `choco install ffmpeg`
macOS	`brew install ffmpeg`
Linux	`apt install ffmpeg`

Installation

npm install -g @tmhs/screencast-mcp

Or run it from a clone:

git clone https://github.com/TMHSDigital/screencast-mcp.git
cd screencast-mcp
npm install
npm run build      # produces dist/index.js, the server entry point

MCP client configuration

{
  "mcpServers": {
    "screencast": {
      "command": "npx",
      "args": ["-y", "@tmhs/screencast-mcp"]
    }
  }
}

Tip

Running from a clone instead of the published package? Point the client straight at the build: "command": "node", "args": ["C:/path/to/screencast-mcp/dist/index.js"].

Usage

A typical watch loop — record, sample, look:

// 1. record a region for a few seconds
start_recording { "target": "region:0,0,1280,720", "quality": "draft" }
//    -> { "sessionId": "rec-…", "outputPath": "…/recordings/rec-….mp4" }

// 2. finalize the file
stop_recording { "sessionId": "rec-…" }
//    -> { "durationSec": 4.2, "finalizedGracefully": true }

// 3. turn it into frames the agent can view
sample_frames { "input": "…/recordings/rec-….mp4", "timestamps": [0.5, 2, 3.5] }
//    -> { "frames": ["…/frames/…/frame_000_0.5s.png", …] }

screenshot { "target": "window:My App" } is the one-shot equivalent for a still.

Windows notes

Multi-monitor offsets. gdigrab has no "capture monitor N" selector, so a monitor target captures the whole virtual desktop and crops to that display's pixel bounds (from System.Windows.Forms.Screen.AllScreens). monitor:1 grabs the second display at its real offset; monitor:0 is always primary.
Window capture resolves the window to the on-screen rectangle it occupies and captures that, rather than the window's own surface — gdigrab's native title= grab returns a blank frame for GPU- or DirectComposition-composited windows (Chrome, Electron, UWP). The window must be visible, on top, and not minimized (a minimized window is rejected with a clear error), the capture includes anything drawn over that rectangle, and for start_recording the rectangle is fixed once at start. True per-window background capture (Windows Graphics Capture API) is a future phase.
Fullscreen-exclusive apps often produce black frames under gdigrab; run the source in borderless-windowed mode for reliable capture.
System audio needs a loopback device. gdigrab is video-only, so start_recording with audio.source = system captures from a separate dshow input. Windows has no native loopback, so this needs a virtual-audio device (enable Stereo Mix, or install a driver such as screen-capture-recorder's virtual-audio-capturer or VB-CABLE). Run list_audio_devices to find it. Microphone capture is intentionally not supported.

Threat model

Warning

Screen capture can record anything on screen at the moment of capture — passwords, tokens, private messages, and other secrets. window: captures the screen rectangle a window occupies, so overlays, notifications, or another window drawn over it are captured too. Treat recordings, screenshots, and sampled frames as sensitive by default.

Capture is always explicit. Nothing auto-fires; capture is gated behind an explicit tool call.
Output stays local. Files are written to the local filesystem only — never uploaded, streamed, or transmitted anywhere.
Public repo, private captures. The .gitignore blocks recordings, frames, screenshots, and common video/image output so test media cannot be committed by accident.
Review before sharing. Sample frames or inspect a recording before handing a file to another tool or person, so you know what it contains.
Redaction is declared, not detected. redact_region covers only the rectangles you specify, so it depends on you (or the agent) having found the secret first. Use the default box style for a solid irreversible fill, and still review the output before sharing.
System audio is sensitive too. When audio.source = system, the recording captures everything playing on the machine (call audio, notifications, media). Treat audio-bearing recordings with the same care as the video.

Project structure

.
├── src/
│   ├── index.ts          # MCP server entry (stdio); registers every tool, reaps orphans
│   ├── context.ts        # shared session-store singleton
│   ├── tools/            # one file per tool (capture, watch, edit)
│   ├── utils/            # ffmpeg, monitors, windows, sessions, paths, targets
│   └── __tests__/        # vitest unit tests + guarded local-capture harness
├── docs/                 # GitHub Pages documentation site
├── mcp-tools.json        # canonical tool manifest (kept in sync with src/tools)
├── .github/workflows/    # CI, release, npm publish, Pages, ecosystem drift check
├── ROADMAP.md · CONTRIBUTING.md · SECURITY.md · LICENSE
└── package.json

Development

npm install
npm run build      # tsc -> dist/
npm test           # vitest (pure unit tests; no ffmpeg or display required)
npm run dev        # tsx watch

The capture path can't be exercised on CI's headless Linux runners, so an end-to-end harness lives behind a flag and is skipped by default:

RUN_LOCAL_CAPTURE_TESTS=1 npm test   # Windows + ffmpeg + a real display

Contributing

Issues and pull requests are welcome — see CONTRIBUTING.md and the Code of Conduct. Security reports go through SECURITY.md.

License

Released under the CC-BY-NC-ND-4.0 license.

Documentation · Roadmap · Report an issue · License

_{Built by TMHSDigital · Back to top ↑}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
assets		assets
docs		docs
src		src
tests		tests
.cursorrules		.cursorrules
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
eslint.config.js		eslint.config.js
mcp-tools.json		mcp-tools.json
package-lock.json		package-lock.json
package.json		package.json
site.json		site.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Screencast MCP

Overview

Tools

Capture

Watch

Minimal edit

Edit surface

Redact

Produce

Targets

Prerequisites

Installation

MCP client configuration

Usage

Windows notes

Threat model

Project structure

Development

Contributing

License

About

Uh oh!

Releases 22

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Screencast MCP

Overview

Tools

Capture

Watch

Minimal edit

Edit surface

Redact

Produce

Targets

Prerequisites

Installation

MCP client configuration

Usage

Windows notes

Threat model

Project structure

Development

Contributing

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages