Skip to content

feat(debug-files): scan inside .zip archives on upload#1141

Merged
BYK merged 4 commits into
mainfrom
byk/feat/debug-files-upload-zips
Jun 26, 2026
Merged

feat(debug-files): scan inside .zip archives on upload#1141
BYK merged 4 commits into
mainfrom
byk/feat/debug-files-upload-zips

Conversation

@BYK

@BYK BYK commented Jun 25, 2026

Copy link
Copy Markdown
Member

Summary

Adds ZIP archive scanning to sentry debug-files upload, matching the legacy sentry-cli behavior (try_open_zip / walk_difs_zip). This is Tier B of the debug-files upload roadmap (Tier A core upload landed in #1139, limits + --derived-data in #1140).

.zip files encountered during a scan are now expanded in memory and their entries run through the same filter pipeline (--type/--id/feature filters, size gate) as on-disk files. Pass --no-zips to skip them.

# .zip archives are scanned in place by default
sentry debug-files upload ./symbols.zip
sentry debug-files upload ./build            # finds DIFs in any *.zip under ./build

# opt out
sentry debug-files upload ./build --no-zips

Implementation

  • New src/lib/dif/zip.tsreadZipDifEntries():
    • Detects archives by .zip extension and PK local-header magic (a misnamed non-archive falls back to normal parsing; a real DIF named foo.zip still works).
    • Extracts entries with fflate.unzipSync.
    • Zip-bomb guard: the pre-decompression filter rejects entries whose declared uncompressed size exceeds the server's max_file_size (or omitted = no gate) before inflation, so oversized entries are never materialized. Directory/empty entries are dropped.
    • Malformed/unreadable archives are skipped (debug log), never thrown.
    • Nested archives are not recursed (a .zip inside a .zip is an opaque, non-object entry), matching the legacy tool.
  • prepareDifs() gains a scanZips option (default true). The on-disk per-file logic is extracted into prepareFileDif(), and the parse step into a shared difFromBuffer() so on-disk files and in-memory ZIP entries share the same parse + filter path. Entry display paths are synthetic "<zip>/<entry>" so logs and DIF names stay meaningful.
  • upload command — adds the --no-zips flag and threads scanZips: !flags["no-zips"] through.

Notes / scope

  • fflate added as a devDependency (bundled at build time; check:deps passes).
  • The whole archive is read into memory (fflate's unzipSync is non-streaming); typical symbol zips are tens/hundreds of MB. Per-entry decompression remains bounded by the size gate.
  • Still deferred (Tier C): --symbol-maps (BCSymbolMap) and --il2cpp-mapping line mappings — these need new symbolic WASM exports.

Testing

  • test/lib/dif/zip.test.ts — 12 unit tests (magic/extension detection, malformed input, size gate, no-recursion, scanZips toggle, --type filtering, container not double-counted, misnamed-archive fallback). Archives built in memory with fflate.zipSync — no binary fixtures.
  • test/commands/debug-files/upload.test.ts — 2 command-level tests (.zip scanned by default; --no-zips ignores entries).
  • All 99 dif + debug-files tests pass; typecheck, lint, check:deps, check:patches, check:fragments clean.

Add ZIP archive scanning to `debug-files upload`, matching the legacy
sentry-cli behavior. `.zip` files encountered during a scan are expanded
in memory and their entries run through the same filter pipeline as
on-disk files. Disable with `--no-zips`.

- New `src/lib/dif/zip.ts`: `readZipDifEntries()` detects archives by
  `.zip` extension + `PK` magic, extracts entries via `fflate.unzipSync`,
  and bounds decompression with a pre-decompression size filter
  (zip-bomb guard). Directory and empty entries are dropped; nested
  archives are not recursed.
- `prepareDifs()` gains a `scanZips` option (default true); the per-file
  path is extracted into `prepareFileDif()` and the parse step into the
  shared `difFromBuffer()` so on-disk and in-memory entries share logic.
- `upload` command: add `--no-zips` flag; thread `scanZips` through.

Tests: 12 zip unit tests + 2 command wiring tests. Docs/skill updated.
@github-actions github-actions Bot added the risk: high PR risk score: high label Jun 25, 2026
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://cli.sentry.dev/_preview/pr-1141/

Built to branch gh-pages at 2026-06-26 10:02 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Comment thread src/lib/dif/zip.ts
Comment thread src/lib/dif/scan.ts
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Codecov Results 📊

✅ Patch coverage is 89.80%. Project has 5114 uncovered lines.
✅ Project coverage is 81.52%. Comparing base (base) to head (head).

Files with missing lines (2)
File Patch % Lines
src/lib/dif/zip.ts 89.83% ⚠️ 6 Missing and 1 partials
src/lib/dif/scan.ts 89.74% ⚠️ 4 Missing and 2 partials
Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
+ Coverage    81.48%    81.52%    +0.04%
==========================================
  Files          396       397        +1
  Lines        27588     27671       +83
  Branches     17912     17966       +54
==========================================
+ Hits         22478     22557       +79
- Misses        5110      5114        +4
- Partials      1866      1868        +2

Generated by Codecov Action

Comment thread src/lib/dif/scan.ts Outdated
Source bundles (and JVM bundles / .src.zip) are a ZIP preceded by an
8-byte SYSB+version header, so they start with SYSB, not PK. Verify
readZipDifEntries returns null for them so they upload as sourcebundle
DIFs rather than being expanded into their inner source files.

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4a41f2e. Configure here.

Comment thread src/lib/dif/zip.ts
@BYK

BYK commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

CI status: all green except an external Anthropic API outage

Every check passes except E2E Tests, which fails only on test/e2e/skill-eval.test.ts (the LLM-driven skill eval). The failure is an upstream outage, not a code issue:

[planner] API error: Invalid response body while trying to fetch
https://api.anthropic.com/v1/messages: Premature close
  • All 126 non-LLM E2E tests pass; only the 2 skill-eval cases fail, both with the same Premature close error talking to api.anthropic.com.
  • Reproduced identically across 3 runs over ~25 min — api.anthropic.com is currently flaky/down.
  • Green here: Lint & Typecheck, Unit Tests, Validate generated files, Build npm (22/24), Build Binary, Build Docs, warden, CodeQL, dependency-review.

This PR does not touch the skill or any LLM path. Will re-run once the Anthropic API recovers.

…ession

Node 24.17.0 / 22.23.0 (CVE-2026-48931 http.Agent fix) added a `data`
listener on idle sockets that makes keep-alive fetch reuse throw false
ERR_STREAM_PREMATURE_CLOSE errors. The skill-eval E2E planner hits this
as 'Premature close' talking to api.anthropic.com. Fixed in 24.18.0
(nodejs/node#64004). A floating `node-version: "24"` silently reuses the
runner's pre-cached buggy 24.17.0, so pin the exact patched version.
@BYK

BYK commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

Correction: the Premature close E2E failure was not an Anthropic outage — it's the Node.js ERR_STREAM_PREMATURE_CLOSE regression from the CVE-2026-48931 security release (24.17.0 / 22.23.0), the same bug we hit in Craft/GCS. The http.Agent fix added a data listener on idle sockets that makes keep-alive fetch reuse throw a false Premature close, which the skill-eval planner trips. Fixed in 24.18.0 (nodejs/node#64004).

The E2E job used floating node-version: "24", which silently reuses the runner's pre-cached buggy 24.17.0. Pinned it to 24.18.0 in d7175e2. Re-running now.

Addresses Seer/Bugbot/warden comments and an adversarial review of the
in-place .zip scanner:

- Bound cumulative decompression per archive (maxTotalSize, default 2 GiB).
  unzipSync materializes every accepted entry at once, so the per-entry gate
  alone could not stop a flat zip-bomb or a huge legit archive from
  exhausting memory.
- Gate the container's on-disk size before reading it into memory, mirroring
  the on-disk peek-then-read discipline (warden M2).
- Skip entries using a compression method fflate cannot inflate instead of
  letting unzipSync throw and discard the whole archive (and its valid
  sibling DIFs).
- Make oversized zip entries advisory-only: they warn per entry but no longer
  feed the exit-driving oversizedCount. A compressed entry's format is
  unknown pre-decompression, so counting it produced false "all matched files
  too large" failures when an unrelated large asset sat inside a .zip
  (Bugbot/H2).
- Guard non-finite originalSize as defense-in-depth (Seer).

Adds regression tests for unsupported-compression survival, the cumulative
budget, the container gate, and the advisory oversized accounting.
@BYK BYK merged commit 0cf501a into main Jun 26, 2026
30 checks passed
@BYK BYK deleted the byk/feat/debug-files-upload-zips branch June 26, 2026 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk: high PR risk score: high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant