sonic: Enforce config table ownership statically#2339
Merged
Conversation
The SONiC config generator owns selected config_db.json tables, but the
ownership registry was a hand-maintained allowlist with no enforcement.
Nothing forced a newly handled table into it, so a table the generator
references but does not classify would fall into the unpoliced
pass-through tier and reintroduce the stale-config bug the ownership
model exists to prevent. The existing taxonomy tests only check
relationships between the registered constants, so they cannot catch an
omission. The model also lived only in a docstring far from the helpers
that must honour it.
Add a static guard test that parses config_generator.py and collects
every top-level config_db table referenced on the config dict: config["X"]
subscripts, the defensive config.get/setdefault/pop("X") accessors (a
.get() read is a real dependency on a table), and every literal-keyed
merge form -- config.update({"X": ...}), config.update(X=...),
config.update(**{"X": ...}), config |= {"X": ...}, config = config |
{"X": ...}, and config = {**config, "X": ...}. The test fails when any
referenced table is not classified as owned, inherited, or
image-consumed, naming the offending table and the list to add it to. It
is static rather than runtime because some tables emit only when NetBox
carries their data -- as forthcoming features will (e.g. the BFD and EVPN
support proposed in open PRs, not yet in tree) -- which a
generate-and-inspect test would miss unless its fixture happened to
trigger every table.
So that coverage cannot erode silently, a second guard rejects any
config mutation the collector cannot read: a method call outside the
recognized set, or a reassignment whose right-hand side is not a dict
display, a dict-merge, or a whitelisted base load (copy.deepcopy and
friends). A merge through an arbitrary call -- config = merge(config,
{...}), config = dict(config, X={}) -- is rejected rather than allowed to
hide a table. The one accepted blind spot is a dynamic key in an
otherwise recognized form (config[var], config.setdefault(var, {}),
config.update(name)): the table name is not a literal at the call site,
the generator relies on this for the up-front scaffold and drop loops,
and resolving it would need data-flow analysis.
Introduce a fourth ownership category, IMAGE_CONSUMED_TABLE_KEYS, for
tables read from the image base config but never modified by the
generator (so they pass through to the output unchanged, like a
pass-through table, the difference being only that the generator depends
on one as an input). The defining property is behavioural (read-only),
not the access syntax: inherited tables are read via config.get() too,
so .get() alone does not distinguish them. It is empty for now; the
first consumer is the gNMI listen port read from TELEMETRY. Disjointness
invariants pin that an image-consumed table is never also owned (which
would drop it up front and leave the consuming helper reading an empty
table) nor inherited (read-only contradicts read-and-update), completing
pairwise disjointness across the three classified categories.
Rewrite the generate_sonic_config ownership-model docstring around the
same four categories (owned / inherited / image-consumed / pass-through)
and point it at the guard test, so the documented model and the enforced
model stay in sync. The inherited category, in both the docstring and
the INHERITED_TABLE_KEYS comment, is described as preserving image
content while updating selected fields in place (e.g. the DEVICE_METADATA
localhost attributes and VERSIONS.DATABASE.VERSION), correcting earlier
wording that said inherited content is emitted unchanged or not emitted
by the generator at all.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Roger Luethi <luethi@osism.tech>
08577d9 to
0e28d6f
Compare
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Both
_config_table_keys_referenced_in_sourceand_unanalyzable_config_mutationsindependently parseconfig_generator.py; consider parsing once and sharing the AST to avoid duplicated work and keep the analysis in sync. - Instead of hard-coding
_BASE_LOAD_CALLSas strings and comparing viaast.unparse(value.func), consider resolving allowed base-load functions via their module and name (e.g. matching onast.Attribute/ast.Name) to make the guard more robust to formatting changes and imports.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Both `_config_table_keys_referenced_in_source` and `_unanalyzable_config_mutations` independently parse `config_generator.py`; consider parsing once and sharing the AST to avoid duplicated work and keep the analysis in sync.
- Instead of hard-coding `_BASE_LOAD_CALLS` as strings and comparing via `ast.unparse(value.func)`, consider resolving allowed base-load functions via their module and name (e.g. matching on `ast.Attribute`/`ast.Name`) to make the guard more robust to formatting changes and imports.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
berendt
pushed a commit
that referenced
this pull request
Jun 14, 2026
ACL_TABLE and ACL_RULE are owned by more than one control-plane section helper (SSH, SNMP, gNMI). The decided ownership model (generate_sonic_config docstring, locked by #2297 and the static classification guard from #2339) settles whole-table ownership but not how several helpers share one owned table. The open ACL PRs each rebind the whole table -- config["ACL_TABLE"] = {...} -- so whichever helper runs second drops the first's entries, leaving SSH or SNMP/gNMI unrestricted. Each PR passes its own tests because neither calls the other helper, so the collision is invisible when the PRs are reviewed in isolation. The fix is a rule that co-owned tables must be merged per named entry, never rebound. Following the approach of the #2339 classification guard, the rule is locked in code rather than left to review, so a divergent PR fails CI before it can merge. Add MULTI_OWNER_OWNED_TABLE_KEYS listing ACL_TABLE/ACL_RULE, and a static guard (TestMultiOwnerTableGuard) that parses the generator source and fails on every form that rebinds a listed table wholesale on config: config["X"] = <expr>, config.update({"X": ...}), config |= {"X": ...}, and config = {**config, "X": ...}. These mirror the wholesale-write forms the #2339 mutation backstop recognizes, so no analyzable write can rebind a listed table without being caught. Forms that mutate the table dict in place are left alone: a nested-subscript write config["X"]["k"] = ..., config["X"].update(...), config["X"] |= ..., and config.setdefault("X", {}). Synthetic-source detector tests prove the guard catches each rebind form and passes the in-place forms, so it cannot pass vacuously. The tables are listed ahead of the ACL helpers that introduce them, so the first such helper to land is forced onto the per-key pattern rather than establishing the unsafe one. Two invariants tie the list to the rest of the model: a referenced multi-owner table must also be in OWNED_TABLE_KEYS (so the central up-front drop clears it and no stale entry survives the merge), and multi-owner tables must not be inherited or image-consumed (those regimes are not dropped). Both hold now and after the ACL helpers add ACL_TABLE/ACL_RULE to ON_DEMAND_OWNED. Assisted-by: Claude:claude-fable-5 Signed-off-by: Roger Luethi <luethi@osism.tech>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR establishes a single, enforced ownership model for the SONiC config
generator so that all forthcoming PRs touching the generator converge on one
consistent approach rather than each inventing its own. Several SONiC features
are in flight (ACL hardening, BFD, EVPN, static routes, …); without a
self-enforcing model they drift — adding tables the generator does not
classify, which silently reintroduces the stale-config bug the ownership model
exists to prevent. This change turns the model from documentation into a CI
gate: a new table that is not classified fails the build with a message
pointing at the right list.
This PR implements Decision C of the SONiC ownership tracking issue. Read it
for the full context, the surrounding decisions, and how this change fits in:
That issue records the agreed ownership model and the open decisions steering
the in-flight SONiC PRs; Decision C (the static ownership guard + four-category
taxonomy) is realized here. This PR stands on its own as the enforcement
mechanism.
What
The generator owns selected
config_db.jsontables (dropped and rebuilt everyregen), but the ownership registry was a hand-maintained allowlist with no
enforcement: a table the generator references but forgets to classify falls
into the unpoliced pass-through tier and silently accumulates stale config. The
existing tests only checked relationships between the registered constants, so
they could not catch an omission, and the model lived in a single docstring far
from the helpers that must honour it.
Changes
Static guard test. Parses
config_generator.pyand collects everytop-level
config_dbtable referenced onconfig—config["X"]subscripts, the defensive
config.get/setdefault/pop("X")accessors (a.get()read is a real dependency), and every literal-keyed merge form(
update({...}),update(X=...),update(**{...}),|=,config | {...},{**config, ...}). The test fails when any referenced table is notclassified as owned, inherited, or image-consumed, naming the table and the
list to add it to. Static rather than runtime so it catches tables that emit
only when NetBox carries their data, which a generate-and-inspect test would
miss.
Mutation backstop. A second guard rejects any
configmutation thecollector cannot statically read — an unrecognized method call or a
reassignment that is not a dict display, a dict-merge, or a whitelisted base
load — so coverage cannot erode silently. The one accepted blind spot is a
dynamic key in a recognized form (
config[var],config.update(name)),which the scaffold and drop loops rely on.
Fourth ownership category,
IMAGE_CONSUMED_TABLE_KEYS. For tables readfrom the image base config but never modified (read-only inputs that pass
through to output). Empty for now; the first consumer is the gNMI listen port
read from
TELEMETRY. Disjointness invariants complete pairwise acrossowned / inherited / image-consumed.
Docstring + comments rewritten around the four categories
(owned / inherited / image-consumed / pass-through), with the inherited
category corrected to "preserves base content, updates selected fields in
place" rather than "emitted unchanged".
Notes
backstop passes, full suite green.
EVPN); once this lands they will get a red check pointing at the right list
rather than shipping latent stale-config data loss.
🤖 Generated with Claude Code