Skip to content

sonic: Enforce config table ownership statically#2339

Merged
berendt merged 1 commit into
mainfrom
sonic-ownership-static-guard
Jun 12, 2026
Merged

sonic: Enforce config table ownership statically#2339
berendt merged 1 commit into
mainfrom
sonic-ownership-static-guard

Conversation

@ideaship

@ideaship ideaship commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Purpose

This PR establishes a single, enforced ownership model for the SONiC config
generator so that all forthcoming PRs touching the generator converge on one
consistent approach rather than each inventing its own. Several SONiC features
are in flight (ACL hardening, BFD, EVPN, static routes, …); without a
self-enforcing model they drift — adding tables the generator does not
classify, which silently reintroduces the stale-config bug the ownership model
exists to prevent. This change turns the model from documentation into a CI
gate: a new table that is not classified fails the build with a message
pointing at the right list.

This PR implements Decision C of the SONiC ownership tracking issue. Read it
for the full context, the surrounding decisions, and how this change fits in:

That issue records the agreed ownership model and the open decisions steering
the in-flight SONiC PRs; Decision C (the static ownership guard + four-category
taxonomy) is realized here. This PR stands on its own as the enforcement
mechanism.

What

The generator owns selected config_db.json tables (dropped and rebuilt every
regen), but the ownership registry was a hand-maintained allowlist with no
enforcement: a table the generator references but forgets to classify falls
into the unpoliced pass-through tier and silently accumulates stale config. The
existing tests only checked relationships between the registered constants, so
they could not catch an omission, and the model lived in a single docstring far
from the helpers that must honour it.

Changes

  • Static guard test. Parses config_generator.py and collects every
    top-level config_db table referenced on configconfig["X"]
    subscripts, the defensive config.get/setdefault/pop("X") accessors (a
    .get() read is a real dependency), and every literal-keyed merge form
    (update({...}), update(X=...), update(**{...}), |=, config | {...},
    {**config, ...}). The test fails when any referenced table is not
    classified as owned, inherited, or image-consumed, naming the table and the
    list to add it to. Static rather than runtime so it catches tables that emit
    only when NetBox carries their data, which a generate-and-inspect test would
    miss.

  • Mutation backstop. A second guard rejects any config mutation the
    collector cannot statically read — an unrecognized method call or a
    reassignment that is not a dict display, a dict-merge, or a whitelisted base
    load — so coverage cannot erode silently. The one accepted blind spot is a
    dynamic key in a recognized form (config[var], config.update(name)),
    which the scaffold and drop loops rely on.

  • Fourth ownership category, IMAGE_CONSUMED_TABLE_KEYS. For tables read
    from the image base config but never modified (read-only inputs that pass
    through to output). Empty for now; the first consumer is the gNMI listen port
    read from TELEMETRY. Disjointness invariants complete pairwise across
    owned / inherited / image-consumed.

  • Docstring + comments rewritten around the four categories
    (owned / inherited / image-consumed / pass-through), with the inherited
    category corrected to "preserves base content, updates selected fields in
    place" rather than "emitted unchanged".

Notes

  • Verified clean on the current tree: all referenced tables classify, the
    backstop passes, full suite green.
  • Several open SONiC PRs add tables that are not yet registered (e.g. BFD,
    EVPN); once this lands they will get a red check pointing at the right list
    rather than shipping latent stale-config data loss.

🤖 Generated with Claude Code

The SONiC config generator owns selected config_db.json tables, but the
ownership registry was a hand-maintained allowlist with no enforcement.
Nothing forced a newly handled table into it, so a table the generator
references but does not classify would fall into the unpoliced
pass-through tier and reintroduce the stale-config bug the ownership
model exists to prevent. The existing taxonomy tests only check
relationships between the registered constants, so they cannot catch an
omission. The model also lived only in a docstring far from the helpers
that must honour it.

Add a static guard test that parses config_generator.py and collects
every top-level config_db table referenced on the config dict: config["X"]
subscripts, the defensive config.get/setdefault/pop("X") accessors (a
.get() read is a real dependency on a table), and every literal-keyed
merge form -- config.update({"X": ...}), config.update(X=...),
config.update(**{"X": ...}), config |= {"X": ...}, config = config |
{"X": ...}, and config = {**config, "X": ...}. The test fails when any
referenced table is not classified as owned, inherited, or
image-consumed, naming the offending table and the list to add it to. It
is static rather than runtime because some tables emit only when NetBox
carries their data -- as forthcoming features will (e.g. the BFD and EVPN
support proposed in open PRs, not yet in tree) -- which a
generate-and-inspect test would miss unless its fixture happened to
trigger every table.

So that coverage cannot erode silently, a second guard rejects any
config mutation the collector cannot read: a method call outside the
recognized set, or a reassignment whose right-hand side is not a dict
display, a dict-merge, or a whitelisted base load (copy.deepcopy and
friends). A merge through an arbitrary call -- config = merge(config,
{...}), config = dict(config, X={}) -- is rejected rather than allowed to
hide a table. The one accepted blind spot is a dynamic key in an
otherwise recognized form (config[var], config.setdefault(var, {}),
config.update(name)): the table name is not a literal at the call site,
the generator relies on this for the up-front scaffold and drop loops,
and resolving it would need data-flow analysis.

Introduce a fourth ownership category, IMAGE_CONSUMED_TABLE_KEYS, for
tables read from the image base config but never modified by the
generator (so they pass through to the output unchanged, like a
pass-through table, the difference being only that the generator depends
on one as an input). The defining property is behavioural (read-only),
not the access syntax: inherited tables are read via config.get() too,
so .get() alone does not distinguish them. It is empty for now; the
first consumer is the gNMI listen port read from TELEMETRY. Disjointness
invariants pin that an image-consumed table is never also owned (which
would drop it up front and leave the consuming helper reading an empty
table) nor inherited (read-only contradicts read-and-update), completing
pairwise disjointness across the three classified categories.

Rewrite the generate_sonic_config ownership-model docstring around the
same four categories (owned / inherited / image-consumed / pass-through)
and point it at the guard test, so the documented model and the enforced
model stay in sync. The inherited category, in both the docstring and
the INHERITED_TABLE_KEYS comment, is described as preserving image
content while updating selected fields in place (e.g. the DEVICE_METADATA
localhost attributes and VERSIONS.DATABASE.VERSION), correcting earlier
wording that said inherited content is emitted unchanged or not emitted
by the generator at all.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Roger Luethi <luethi@osism.tech>
@ideaship ideaship force-pushed the sonic-ownership-static-guard branch from 08577d9 to 0e28d6f Compare June 10, 2026 13:39
@ideaship ideaship marked this pull request as ready for review June 10, 2026 13:50
@ideaship ideaship requested a review from berendt June 10, 2026 13:50

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Both _config_table_keys_referenced_in_source and _unanalyzable_config_mutations independently parse config_generator.py; consider parsing once and sharing the AST to avoid duplicated work and keep the analysis in sync.
  • Instead of hard-coding _BASE_LOAD_CALLS as strings and comparing via ast.unparse(value.func), consider resolving allowed base-load functions via their module and name (e.g. matching on ast.Attribute/ast.Name) to make the guard more robust to formatting changes and imports.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Both `_config_table_keys_referenced_in_source` and `_unanalyzable_config_mutations` independently parse `config_generator.py`; consider parsing once and sharing the AST to avoid duplicated work and keep the analysis in sync.
- Instead of hard-coding `_BASE_LOAD_CALLS` as strings and comparing via `ast.unparse(value.func)`, consider resolving allowed base-load functions via their module and name (e.g. matching on `ast.Attribute`/`ast.Name`) to make the guard more robust to formatting changes and imports.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ideaship ideaship moved this from Ready to In review in Human Board Jun 10, 2026
@berendt berendt merged commit 8718b2f into main Jun 12, 2026
3 checks passed
@berendt berendt deleted the sonic-ownership-static-guard branch June 12, 2026 06:59
@github-project-automation github-project-automation Bot moved this from In review to Done in Human Board Jun 12, 2026
berendt pushed a commit that referenced this pull request Jun 14, 2026
ACL_TABLE and ACL_RULE are owned by more than one control-plane
section helper (SSH, SNMP, gNMI). The decided ownership model
(generate_sonic_config docstring, locked by #2297 and the static
classification guard from #2339) settles whole-table ownership but not
how several helpers share one owned table. The open ACL PRs each rebind
the whole table -- config["ACL_TABLE"] = {...} -- so whichever helper
runs second drops the first's entries, leaving SSH or SNMP/gNMI
unrestricted. Each PR passes its own tests because neither calls the
other helper, so the collision is invisible when the PRs are reviewed
in isolation.

The fix is a rule that co-owned tables must be merged per named entry,
never rebound. Following the approach of the #2339 classification
guard, the rule is locked in code rather than left to review, so a
divergent PR fails CI before it can merge.

Add MULTI_OWNER_OWNED_TABLE_KEYS listing ACL_TABLE/ACL_RULE, and a
static guard (TestMultiOwnerTableGuard) that parses the generator
source and fails on every form that rebinds a listed table wholesale on
config: config["X"] = <expr>, config.update({"X": ...}),
config |= {"X": ...}, and config = {**config, "X": ...}. These mirror
the wholesale-write forms the #2339 mutation backstop recognizes, so no
analyzable write can rebind a listed table without being caught. Forms
that mutate the table dict in place are left alone: a nested-subscript
write config["X"]["k"] = ..., config["X"].update(...),
config["X"] |= ..., and config.setdefault("X", {}). Synthetic-source
detector tests prove the guard catches each rebind form and passes the
in-place forms, so it cannot pass vacuously.

The tables are listed ahead of the ACL helpers that introduce them, so
the first such helper to land is forced onto the per-key pattern rather
than establishing the unsafe one. Two invariants tie the list to the
rest of the model: a referenced multi-owner table must also be in
OWNED_TABLE_KEYS (so the central up-front drop clears it and no stale
entry survives the merge), and multi-owner tables must not be inherited
or image-consumed (those regimes are not dropped). Both hold now and
after the ACL helpers add ACL_TABLE/ACL_RULE to ON_DEMAND_OWNED.

Assisted-by: Claude:claude-fable-5

Signed-off-by: Roger Luethi <luethi@osism.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants