Return non-zero exit from get on query failure by ideaship · Pull Request #2313 · osism/python-osism

ideaship · 2026-06-01T13:07:01Z

What

osism get hostvars and osism get hosts logged an error but then returned None when the underlying ansible-inventory subprocess failed. cliff turns a take_action return value of None into exit code 0 (return_code = self.take_action(parsed_args) or 0), so these commands exited successfully even when the inventory query could not be run at all — making them unsafe in set -e scripts and && chains.

Both commands now return 1 on subprocess.CalledProcessError, i.e. when the query itself could not be executed.

Missing host

With the pinned ansible-core (2.18.9 — release repo latest/base.yml), ansible-inventory --host <unknown> does not return an empty result. It refuses the query with ERROR! You must pass a single valid host to --host parameter and exits 5. So osism get hostvars <unknown-host> now exits 1, matching its existing Host not found in inventory. error message.

What stays exit 0 (empty results)

Empty results are deliberately left as success — a query that ran fine but had nothing to show:

a host that exists but has no host vars (ansible-inventory prints {}, exits 0)
a requested variable/fact absent from an otherwise successfully retrieved data set
an inventory that lists no matching hosts

A note on the convention

There is no documented exit-code contract for the osism CLI — the behaviour is emergent and inconsistent across commands. This PR does not attempt to define or enforce one. It only tries to make osism get match what looks like the prevailing convention elsewhere in the CLI: a non-zero exit signals an operational failure (the query could not be run), while an empty result set is treated as a valid, successful answer.

Tests

Adds unit tests for both commands covering the failure path (raised CalledProcessError → exit 1) and the empty-result path (successful query with nothing to show → exit 0).

osism get hostvars and osism get hosts logged an error but then returned None when the underlying ansible-inventory subprocess failed. cliff turns a take_action return value of None into exit code 0 (return_code = self.take_action(parsed_args) or 0), so these commands exited successfully even though the inventory query could not be run at all. That makes them unsafe to use in set -e scripts or && chains, where a failed lookup silently looks like success. Both commands now return 1 when subprocess.CalledProcessError is raised, i.e. when the query itself could not be executed. A missing host is one such failure. With the pinned ansible-core (2.18.9, release latest/base.yml), ansible-inventory --host on a host that is not in the inventory does not return an empty result; it refuses the query with "ERROR! You must pass a single valid host to --host parameter" and exits 5. osism get hostvars for an unknown host therefore now exits 1, which matches its existing "Host not found in inventory." error message. Empty results are deliberately left as exit 0: a host that exists but has no host vars (ansible-inventory prints {} and exits 0), a requested variable or fact that is absent from an otherwise successfully retrieved data set, or an inventory that lists no matching hosts. These are valid answers to a query that did run. This keeps the behaviour consistent with the convention used elsewhere in the CLI, where a non-zero status signals an operational failure rather than an empty result set. Adds unit tests for both commands covering the failure path (a raised CalledProcessError yields exit code 1) and the empty-result path (a successful query with nothing to show yields exit code 0). AI-assisted: Claude Code Signed-off-by: Roger Luethi <luethi@osism.tech>

ideaship · 2026-06-02T05:44:38Z

Follow-up: #2314 catalogs other osism subcommands with the same exit-0-on-failure pattern this PR fixes in osism get (timeouts, failed inventory queries, and not-found lookups across reconciler, validate, wait, report, baremetal, compute, server, volume, netbox).

sourcery-ai

Hey - I've reviewed your changes and they look great!

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

Several commands wait for a Celery task's output, hit TimeoutError, log the timeout, and then fall through to the end of take_action with an implicit None return. cliff turns that into exit code 0 (return_code = self.take_action(parsed_args) or 0), so a command that gave up waiting for its task still reports success. That makes the affected commands unsafe in set -e scripts and && chains, where a timed-out sync silently looks like it completed. This is the same class of bug PR #2313 fixed for osism get, but on the timeout path rather than the inventory-query path. The affected sites: - reconciler.Sync - validate.Run._handle_task (the value take_action returns) - netbox.Ironic - netbox.Sync Each now returns 1 from the TimeoutError handler, so a timeout yields a non-zero exit. Adds unit tests asserting exit 1 when fetch_task_output raises TimeoutError, mirroring PR #2313's failure-path tests. Related-Bug: #2314 AI-assisted: Claude Code Signed-off-by: Roger Luethi <luethi@osism.tech>

osism report memory, lldp, bgp and status each load the Ansible inventory by running ansible-inventory via subprocess.run. On a non-zero return code ("Error loading inventory.") and on subprocess.TimeoutExpired ("Timeout loading inventory.") they logged the error and then bare-returned None. cliff turns that into exit code 0 (return_code = self.take_action(parsed_args) or 0), so a report whose inventory could not be loaded at all still exited successfully. That makes these commands unsafe in set -e scripts and && chains. These are the direct twins of the bug PR #2313 fixed for osism get. All eight inventory-load failure paths now return 1. The neighbouring "No hosts found in inventory." branch is deliberately left at exit 0: the query ran fine and simply returned no hosts, which is a valid empty result rather than an operational failure, matching the convention established in PR #2313. Adds unit tests for all four commands covering the failure path (a non-zero ansible-inventory return code yields exit 1), the timeout path (TimeoutExpired yields exit 1) and the empty-result path (an empty inventory yields exit 0). Part of issue #2314 (Category B). Related-Bug: #2314 AI-assisted: Claude Code Signed-off-by: Roger Luethi <luethi@osism.tech>

Several commands look up a resource, log an error or warning when it cannot be found, and then bare-return None. cliff turns that into exit code 0 (return_code = self.take_action(parsed_args) or 0), so a failed lookup silently looks like success and the commands are unsafe in set -e scripts and && chains. All the affected not-found / unresolved-lookup paths now return 1: - compute.ComputeMigrationList: no user domain, no user, no project domain, no project, multiple servers matched, no server found. - server.ServerList: no user domain, no user, domain not found, project domain not found, project not found. - volume.VolumeList: domain not found, project domain not found, project not found. - netbox.Console / netbox.Dump: NetBox integration not configured (the operation cannot run without it), and Dump's device not found. - baremetal Deploy, Undeploy, BurnIn, Clean, Provide, MaintenanceSet, MaintenanceUnset, PowerOn, PowerOff and Delete: the single named node could not be found (conn.baremetal.find_node returned nothing). Adds unit tests for each command driving the not-found path and asserting exit 1, mirroring PR #2313. Part of issue #2314 (Category C and Category D not-found lookups). Related-Bug: #2314 AI-assisted: Claude Code Signed-off-by: Roger Luethi <luethi@osism.tech>

osism-agent added this to Human Board Jun 1, 2026

github-project-automation Bot moved this to Ready in Human Board Jun 1, 2026

ideaship mentioned this pull request Jun 2, 2026

CLI commands log an error but exit 0 on failure #2314

Closed

ideaship requested a review from berendt June 2, 2026 14:59

ideaship marked this pull request as ready for review June 2, 2026 15:00

ideaship moved this from Ready to In review in Human Board Jun 2, 2026

sourcery-ai Bot reviewed Jun 2, 2026

View reviewed changes

berendt merged commit 44674c3 into main Jun 2, 2026
4 checks passed

berendt deleted the get-nonzero-exit-on-query-failure branch June 2, 2026 15:37

github-project-automation Bot moved this from In review to Done in Human Board Jun 2, 2026

ideaship mentioned this pull request Jun 8, 2026

Return non-zero exit codes on CLI failure paths #2331

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Return non-zero exit from get on query failure#2313

Return non-zero exit from get on query failure#2313
berendt merged 1 commit into
mainfrom
get-nonzero-exit-on-query-failure

ideaship commented Jun 1, 2026

Uh oh!

ideaship commented Jun 2, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ideaship commented Jun 1, 2026

What

Missing host

What stays exit 0 (empty results)

A note on the convention

Tests

Uh oh!

ideaship commented Jun 2, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants