Skip to content

Cache CLI extractor paths across Actions steps#3950

Open
mario-campos wants to merge 9 commits into
mainfrom
mario-campos/cache-cli-resolve-langs
Open

Cache CLI extractor paths across Actions steps#3950
mario-campos wants to merge 9 commits into
mainfrom
mario-campos/cache-cli-resolve-langs

Conversation

@mario-campos

@mario-campos mario-campos commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Similar to #3943, this PR caches the output of codeql resolve languages, which contains the paths to the various extractors so that repeated calls to resolveLanguages() are idempotent. Additionally, re-implement resolveExtractor() as a wrapper over resolveLanguages() (to re-use the cached output) rather than shell out to codeql resolve extractor.

In one experiment, I counted seven instances of shelling out to codeql resolve extractor. When you dig into the code, you can see why: resolveExtractor() is not called often or from many places; But one caller is isTracedLanguage(), which is wrapped by isScannedLanguage(). And these functions are often used in a loop/map over all/some languages. This can explain why we see consecutive executions of codeql resolve extractor.

In support of the above goals, this PR also adds some additional functions to the json module, to enable validation of the codeql version output.

Risk assessment

For internal use only. Please select the risk level of this change:

  • Low risk: Changes are fully under feature flags, or have been fully tested and validated in pre-production environments and are highly observable, or are documentation or test only.

Which use cases does this change impact?

Workflow types:

  • Advanced setup - Impacts users who have custom CodeQL workflows.
  • Managed - Impacts users with dynamic workflows (Default Setup, Code Quality, ...).

Products:

  • Code Scanning - The changes impact analyses when analysis-kinds: code-scanning.
  • Code Quality - The changes impact analyses when analysis-kinds: code-quality.
  • Other first-party - The changes impact other first-party analyses.
  • Third-party analyses - The changes affect the upload-sarif action.

Environments:

  • Dotcom - Impacts CodeQL workflows on github.com and/or GitHub Enterprise Cloud with Data Residency.
  • GHES - Impacts CodeQL workflows on GitHub Enterprise Server.
  • Testing/None - This change does not impact any CodeQL workflows in production.

How did/will you validate this change?

  • Unit tests - I am depending on unit test coverage (i.e. tests in .test.ts files).
  • End-to-end tests - I am depending on PR checks (i.e. tests in pr-checks).
  • Other - Manual/local testing

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

  • Feature flags - All new or changed code paths can be fully disabled with corresponding feature flags.
  • Rollback - Change can only be disabled by rolling back the release or releasing a new version with a fix.
  • Development/testing only - This change cannot cause any failures in production.
  • Other - Please provide details.

How will you know if something goes wrong after this change is released?

  • Telemetry - I rely on existing telemetry or have made changes to the telemetry.
    • Dashboards - I will watch relevant dashboards for issues after the release. Consider whether this requires this change to be released at a particular time rather than as part of a regular release.
    • Alerts - New or existing monitors will trip if something goes wrong with this change.
  • Other - Please provide details.

Are there any special considerations for merging or releasing this change?

  • No special considerations - This change can be merged at any time.
  • Special considerations - This change should only be merged once certain preconditions are met. Please provide details of those or link to this PR from an internal issue.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Consider adding a changelog entry for this change.
  • Confirm the readme and docs have been updated if necessary.

@github-actions github-actions Bot added the size/S Should be easy to review label Jun 4, 2026

@henrymercer henrymercer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching these invocations makes a lot of sense! I have a high level comment and a couple of lower level comments.

The main point is that now that we're caching multiple invocations, it might be a good opportunity to generalise the design. For instance, you could imagine something like:

const versionCache = createPersistedCliCache({ envVar: EnvVar.CODEQL_VERSION_INFO, validate: isVersionInfo });
const resolveLanguagesCache = createPersistedCliCache({ envVar: EnvVar.CODEQL_RESOLVE_LANGUAGES, validate: isResolveLanguagesOutput });

where createPersistedCliCache handles memoising in the Action and persisting between Actions steps with an environment variable.

Some smaller things:

  • Ideally the cache entry would also depend on getExtraOptionsFromEnv(["resolve","languages"])
  • We should remove the cache in testing-utils.ts like we do for the CodeQL version cache

Comment thread src/codeql.ts Outdated

@mbg mbg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @henrymercer's comments regarding a more generalised design for this. I am wondering about the use of environment variables here vs using a file on disk. I don't know if you have already considered this, but we store e.g. the Action configuration on disk as a file. Perhaps that would make sense for these cached CLI results as well.

A general point: could we also make sure to add doc comments for new top-level definitions before merging?

Comment thread src/environment.ts Outdated
Comment thread src/util.ts Outdated
Repeated calls to `resolveLanguages()` will only pay the performance penalty of executing `codeql resolve languages` once.
By wrapping `resolveLanguages()`, which is memoized, we can avoid executing `codeql resolve extractor` several times over the course of an analysis.
This commit adds a `number` validator`, an `object` validator, an `isNumber` predicate, and `undefinable()` to test optional-but-not-null properties.
This provides a separation of concerns between the memoization and the execution.
@mario-campos mario-campos force-pushed the mario-campos/cache-cli-resolve-langs branch from c218fd6 to b18df17 Compare June 18, 2026 15:25
@github-actions github-actions Bot added size/XL May be very hard to review and removed size/S Should be easy to review labels Jun 18, 2026
@mario-campos

Copy link
Copy Markdown
Contributor Author

I've taken your comments into consideration and overhauled the design to be more comprehensive and unified. The design now backs to a temporary file instead of the environment. I also identified a few opportunities to refactor some duplicated code into helper functions.

I kept the use of cmd as a key in the cache, but I question whether it's really necessary. I think it's safe to assume that, in most cases, there will only be one instance of codeql in use per job. And, even in the event that there's more than one instance, how likely is it that init would use a different version than autobuild or analyze? If it's not necessary, I would opt to delete it to simplify the code a bit.

@mario-campos mario-campos marked this pull request as ready for review June 18, 2026 15:44
@mario-campos mario-campos requested a review from a team as a code owner June 18, 2026 15:44
Copilot AI review requested due to automatic review settings June 18, 2026 15:44

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

  • Copilot's review of this pull request may be incomplete because some of the changed files are excluded by your Copilot content exclusion settings. See Excluding content from Copilot for details.

Pull request overview

This PR introduces a cross-step cache for selected CodeQL CLI command outputs (notably codeql version and codeql resolve languages) to reduce repeated JVM startups and improve performance across GitHub Actions steps. It also refactors extractor resolution to derive extractor roots from resolve languages (reusing the cached output) and extends the internal JSON validation helpers to support stronger runtime validation of CLI JSON output.

Changes:

  • Add a new 2-tier command-output cache (in-memory + temp-file) and wire it into codeql.ts for version and resolve languages.
  • Refactor resolveExtractor() to use resolveLanguages() rather than invoking codeql resolve extractor.
  • Extend src/json validation helpers (number/object validators and undefinable) and add unit tests; remove now-obsolete util-based version cache.
Show a summary per file
File Description
src/util.ts Removes the prior in-process/env-var version cache helpers.
src/util.test.ts Removes tests for the old version-caching behavior.
src/testing-utils.ts Updates test setup to reset the new command-output cache between tests.
src/status-report.ts Switches telemetry version lookup to the new cache + isVersionInfo guard.
src/json/index.ts Adds number, object, and undefinable validators to support schema checks.
src/json/index.test.ts Adds tests for undefinable semantics (rejecting null).
src/environment.ts Removes the env var used for the old persisted version cache.
src/codeql.ts Adds caching wrappers/type guards and refactors extractor resolution and JSON parsing.
src/cache.ts New: implements the command-output cache (memo + temp file).
src/cache.test.ts New: tests cache persistence/memo behavior and validation.
lib/entry-points.js Generated output (content excluded by policy; not reviewed).

Copilot's findings

Files excluded by content exclusion policy (1)
  • lib/entry-points.js
  • Files reviewed: 10/11 changed files
  • Comments generated: 3

Comment thread src/cache.ts
Comment on lines +119 to +123
// Tier 1: the in-memory variable.
const memoized = inMemoryCache.get(key);
if (memoized !== undefined) {
return memoized.output as T;
}
Comment thread src/codeql.ts
Comment thread src/codeql.ts
Comment on lines +794 to +798
return getCachedOrRun(
CommandCacheKey.ResolveLanguages,
cmd,
() =>
runCliJson<ResolveLanguagesOutput>(cmd, [
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL May be very hard to review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants