Skip to content

Latest commit

 

History

History
46 lines (44 loc) · 3.48 KB

File metadata and controls

46 lines (44 loc) · 3.48 KB

Perry Performance Run Log

2026-06-17 - Typed feedback registration hoist

  • Start revision: e816fc3e4af1
  • Branch: codex/perry-performance-20260617
  • Worker assignment: single Codex pass in this worktree
  • Benchmark environment: Linux /usr/bin/time -v; local node cannot execute .ts benchmark inputs, so Node columns and correctness comparisons were skipped by the harness
  • Baseline commands:
    • cargo build --release
    • ./benchmarks/quick.sh
    • ./benchmarks/compare.sh --quick --runs 3 --warn-only --json-out /tmp/perry-baseline-e816fc3e4.json
  • Baseline results:
    • quick: fibonacci 260ms/18MB, math_intensive 73ms/18MB, nested_loops 3508ms/17MB, factorial 95ms/18MB, matrix_multiply 6462ms/27MB
    • compare quick medians: loop_overhead 74ms/18772KB, fibonacci 262ms/18696KB, math_intensive 70ms/18696KB, nested_loops 3383ms/17724KB, factorial 96ms/18836KB
  • Selected gap and evidence:
    • nested_loops dominated the quick compare set at 3383ms; matrix_multiply was the slowest quick.sh case at 6462ms.
    • LLVM trace for benchmarks/suite/10_nested_loops.ts showed js_typed_feedback_register_site(...) emitted inside the hot for.body.21 inner loop before each typed-feedback array guard.
  • Change:
    • Added LlFunction::entry_setup_call_void and changed typed-feedback site registration to emit once in function-entry setup instead of at every guard use site.
    • Kept guard, fallback, pass, and counter calls at original use sites so runtime evidence semantics remain per-use.
    • Updated benchmark harnesses to support Linux RSS measurement and skip Node .ts columns when the installed Node cannot run TypeScript directly.
  • Post-change benchmark commands:
    • cargo build --release
    • ./benchmarks/compare.sh --quick --runs 3 --warn-only --json-out /tmp/perry-final-e816fc3e4.json
    • ./benchmarks/quick.sh
  • Post-change results:
    • compare quick medians: loop_overhead 74ms/18768KB, fibonacci 261ms/18920KB, math_intensive 69ms/18944KB, nested_loops 956ms/19152KB, factorial 94ms/18896KB
    • quick: fibonacci 262ms/18MB, math_intensive 55ms/18MB, nested_loops 965ms/18MB, factorial 75ms/18MB, matrix_multiply 1842ms/28MB
  • Measured impact:
    • 10_nested_loops compare median: 3383ms -> 956ms, 71.7% faster
    • 16_matrix_multiply quick: 6462ms -> 1842ms, 71.5% faster
  • Verification:
    • bash -n benchmarks/quick.sh
    • bash -n benchmarks/compare.sh
    • cargo fmt --check
    • cargo test -p perry-codegen --test typed_feedback
    • PERRY_BIN=target/release/perry python3 tests/test_typed_feedback_runtime_evidence.py
    • tests/test_benchmark_output_verifier.sh
    • target/release/perry compile --no-cache benchmarks/suite/10_nested_loops.ts -o /tmp/perry-nested-loops-final --trace llvm --quiet; trace confirmed registration calls in entry setup only and no registration calls in for.body.21
    • /tmp/perry-nested-loops-final produced nested_loops:963 and sum:26991000000
    • target/release/perry compile --no-cache benchmarks/suite/16_matrix_multiply.ts -o /tmp/perry-matrix-multiply-final --quiet && /tmp/perry-matrix-multiply-final produced matrix_multiply:1778 and checksum:41079519680
  • Notes:
    • benchmarks/baseline.json is stale for this Linux environment; compare was run with --warn-only and the before/after comparison above uses the captured local baseline JSON.
    • Follow-up candidates remain in typed array and numeric array hot paths, but this cycle stopped at the isolated registration-hoist optimization.
  • PR: #5295