Add ask_vlm_verify in ExperimentalApi for cloud VLM alert verification#442
Add ask_vlm_verify in ExperimentalApi for cloud VLM alert verification#442srnangi wants to merge 28 commits into
Conversation
Add Groundlight.ask_vlm(images, query, model_id) which verifies one or two images against a natural-language query by calling POST /v1/vlm-queries. Returns a VLMVerificationResult dataclass with verdict (YES/NO/UNSURE), confidence, reasoning, and token cost. - Accepts a single image or [full_frame, roi] for the dual-image strategy, reusing parse_supported_image_types for encoding. - Moves the requests import to module level. - Exports VLMVerificationResult from the package. - Unit tests with mocked HTTP. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- POST query and model_id as multipart form fields (data=) instead of query-string params, matching the updated endpoint and keeping long prompts out of URLs and access logs. - model_id is now a friendly alias (e.g. "gpt-5.4", "claude-sonnet-4.5") resolved server-side, not a raw Bedrock model ID. - Tests updated to assert form-field transport. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the gpt-5.4 example (OpenAI models on Bedrock are text-only and cannot do image verification); use claude-sonnet-4.5 / nova-pro instead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Match the generalized endpoint: param images -> media, multipart field 'media', guard raised from 2 to 8. The query should describe each media item (server makes no frame/ROI assumption). Docstring + tests updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Endpoint renamed server-side from vlm-queries to vlm-verifications. Update the SDK POST path and test fixtures accordingly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sanitize_endpoint_url() strips the trailing slash from self.endpoint, so joining without "/" produced ".../device-apiv1/vlm-verifications" instead of ".../device-api/v1/vlm-verifications". Added test_url_has_correct_path to pin the correct URL shape. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
b2b0755 to
263808d
Compare
… tests - model_id docstring now lists all current supported aliases with a note that the server is the source of truth (400 on unknown alias) - documents that corrupted bytes are validated server-side -> HTTPError 400 - rewrites test_ask_vlm.py as module-level functions matching repo convention - adds test_timeout_passed_to_requests: verifies timeout kwarg forwarded - adds test_corrupted_image_bytes_raises_http_error: server 400 -> HTTPError Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop tests that only verify kwarg passthroughs or mock server-side behavior (timeout forwarding, corrupted-image 400, dual-image loop, model_id omission). Keep the five that catch real issues or verify non-obvious invariants: result parsing, image encoding, form-field vs URL security property, >8 guard, and the URL path bug. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…suppression - Extract MAX_VLM_MEDIA_ITEMS = 8 constant (fixes PLR2004 magic value in client.py) - test_ask_vlm: use optional numpy import + @skipif(MISSING_NUMPY) so test collection doesn't fail in envs without numpy (fixes ModuleNotFoundError) - Replace _FAKE_JPEG bytes for tests that don't exercise encoding (removes numpy dependency from 4 of 5 tests entirely) - Remove magic 400 assertion from result-parsing test (fixes PLR2004 in test) - Add pylint: disable comments on VLMVerificationResult (too-many-instance-attributes) and ask_vlm (too-many-locals) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new SDK surface for VLM-based alert verification against the Groundlight cloud by introducing Groundlight.ask_vlm(...), returning a structured VLMVerificationResult, and exporting that result type from the package root. The PR also adds mocked unit tests and includes a regression test for correct URL path construction.
Changes:
- Add
Groundlight.ask_vlm(media, query, model_id)that uploads 1–8 images as multipart/form-data to/v1/vlm-verificationsand parses the response intoVLMVerificationResult. - Export
VLMVerificationResultfromgroundlight.__init__. - Add unit tests for request construction (multipart fields/files, URL path) and response parsing.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/groundlight/client.py |
Adds VLMVerificationResult, MAX_VLM_MEDIA_ITEMS, and the new ask_vlm method using requests multipart upload. |
src/groundlight/__init__.py |
Re-exports VLMVerificationResult from the package root. |
test/unit/test_ask_vlm.py |
Adds mocked unit tests validating request/response handling and URL path behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix type annotation: List now typed as List[Union[...]] to match all supported per-item types, not just List[np.ndarray] - Add empty-list guard: raise ValueError before any network call - Fix URL version doubling: strip trailing /vN from endpoint before appending /v1/vlm-verifications (sanitize_endpoint_url allows /v1 paths) - Use time.time_ns() for request ID to avoid ms-precision collisions - Clarify docstring: bytes/streams accept any common format (JPEG/PNG/WEBP), server normalises to JPEG server-side Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Corrects the accepted filename types (JPEG/PNG only; WEBP filenames unsupported by bytestream_from_filename) and clarifies that raw bytes/BytesIO/BufferedReader are sent as-is (server normalises via ensure_jpeg_format regardless of declared content-type). - Adds test_url_no_version_duplication_for_versioned_endpoint to guard against the /v1/v1/ double-version regression. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The CLI auto-registers all public Groundlight methods but raises KeyError if a method is missing from _COMMAND_GROUPS. Add ask_vlm to _COMMAND_GROUPS and _GROUP_ORDER under a new "VLM Verification" panel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| MAX_VLM_MEDIA_ITEMS = 8 | ||
|
|
||
|
|
||
| @dataclass |
There was a problem hiding this comment.
This should be a Pydantic model. That matches the pattern we have established in this SDK.
Consistency aside, I think there will also be a problem with the CLI if we use a dataclass. The CLI has special logic to pretty print objects, and I don't believe it has any handling for dataclasses.
There was a problem hiding this comment.
We shouldn't be hand writing a class like this, should we? Most classes like this are autogenerated by the API spec.
There was a problem hiding this comment.
Fixed: converted VLMVerificationResult from a dataclass to a pydantic BaseModel, matching the established SDK pattern and enabling proper CLI pretty-printing via model_dump_json.
🤖 Addressed by Claude Code
There was a problem hiding this comment.
Acknowledged — the ideal end-state is for VLMVerificationResult to be generated from the OpenAPI spec (alongside VlmVerification in zuuul). The janzu endpoint is not deployed yet, so wiring up spec generation is a follow-on task. For now the hand-written Pydantic BaseModel gives correct CLI behaviour and the same interface, and can be replaced by the generated model once the spec is stable.
🤖 Addressed by Claude Code
timmarkhuff
left a comment
There was a problem hiding this comment.
I think there are some consistency issues with this PR that I have noted in my comments. We'll want Brandon to weigh in on these too.
…pe annotations
- VLMVerificationResult is now a pydantic BaseModel (matching SDK patterns
and enabling proper CLI pretty-printing via model_dump_json).
- Remove double-quoted forward refs ("np.ndarray", "Image.Image") in
ask_vlm signature — consistent with every other method in client.py
that uses the same Image/np imports.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Thanks @timmarkhuff for the comments. My claude agent was on auto-reply mode for this PR, so it replied on my behalf without me looking at it, sorry about that. I'll take a look at your comments again carefully. |
|
A few more things that just occurred to me:
|
brandon-wada
left a comment
There was a problem hiding this comment.
This is a complete break from the OpenApi modeling that underpins most of the SDK. We can leave this on branch and test with it, but once we're confident that this is the feature we want then we should model the endpoint
Alternately, we have the make_generic_api_request method, we could just leverage that for the moment
There was a problem hiding this comment.
Should probably go in experimental
| "create_roi", # returns an ROI object that must be passed to another API call; not useful standalone | ||
| "get_raw_headers", # returns the API token in plaintext | ||
| "make_generic_api_request", | ||
| } |
There was a problem hiding this comment.
Do you want this in the cli? Definitely needs tests if so
There was a problem hiding this comment.
I see this to be primarily used by vision-engine service or to run some experiments, so I dont think we have a CLI use case here. So, I'm not adding it in this PR.
There was a problem hiding this comment.
Added CLI and tests, but it only supports 1 image for now.
paulina-positronix
left a comment
There was a problem hiding this comment.
Get Brandon's help to learn how to update the SDK using the right swagger tools etc. and otherwise I have one comment/musing on the naming of the method.
| "get_priming_group": "ML Pipelines & Priming", | ||
| "delete_priming_group": "ML Pipelines & Priming", | ||
| # VLM Verification | ||
| "ask_vlm": "VLM Verification", |
There was a problem hiding this comment.
On the one hand, similar to my comments on the zuuul PR, if this is specific to verification then maybe the method should be ask_vlm_verify or similar, since the returned result will be verification-specific. On the other hand, I kinda like ask_vlm because it could be also used to just get the answer in the first place. But maybe we want to discourage such use due to potential for high cost. Hm... I'm of two minds about it, what do you think?
There was a problem hiding this comment.
This got me thinking about the name VLMVerificationResult. Maybe something like VlmResult would be better. I get that we might encourage people to only use this for verification because it's expensive, but it isn't inherently a "verification"; it's only a verification if you use it that way.
Also, this is an opinionated nit, but I think for class names we should use Vlm.. rather than VLM.... The argument is that Pascal case is about word boundaries, not capitalization. We already follow this pattern in classes like ExperimentalApi so maybe it is worth continuing that pattern.
There was a problem hiding this comment.
These are both great points, and they made me think about the API design a bit more. I'm leaning towards making this more specific and renaming it to ask_vlm_verify and use VlmVerificationResult.
The main reason is that, in Zuuul, we built the Bedrock client and the vlm-verification endpoint primarily for verification tasks, with a system prompt and answer parsing (verdict, confidence, reasoning) specifically tailored for alert verification. While we could make it more generic by allowing callers to pass their own system prompts and get raw answers, I think it's better to keep this endpoint focused on verification for now to get structured output. This also helps discourage using it for general-purpose VLM calls, given the relatively high cost.
Additionally, I've added restrictions in the Zuuul endpoint so that only certain customer types can access it. Given that context, I'd prefer to optimize for the specific use case we're building today and generalize it later if and when we have a clear need.
…erimental Reworks the VLM verification feature to follow the SDK codegen process per review feedback (brandon-wada, timmarkhuff): - Add /v1/vlm-verifications + VlmVerification/VlmVerificationResult/ VlmVerificationCost/VerdictEnum schemas to spec/public-api.yaml (synced from zuuul#6519) and regenerate generated/model.py. The result model is now generated, not hand-written. - Add ExperimentalApi.ask_vlm_verify returning the generated VlmVerification (nested result/cost), reusing the create_note multipart pattern, parse_supported_image_types, and _generate_request_id. - Remove the hand-written ask_vlm method, VLMVerificationResult class, and the now-unused re/requests/BaseModel imports from client.py; fix __init__ export. - Drop the CLI additions (cli.py reverted to match main). - Replace test_ask_vlm.py with test_ask_vlm_verify.py covering the nested VlmVerification shape, media validation, and the /v1 dedup regression. Note: the Java openapi-generator step of `make generate` still needs to run in CI/an env with a JDK to regenerate the groundlight_openapi_client package. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…odel files Runs the openapi-generator (Java) step that was skipped in the previous commit due to missing JDK. Adds the generated groundlight_openapi_client API class and model files for VLM verification: - api/vlm_verifications_api.py - model/vlm_verification.py - model/vlm_verification_cost.py - model/vlm_verification_result.py - model/verdict_enum.py - docs/ and test/ scaffolding for the above Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous "drop CLI" commit reverted cli.py against a stale local main (at #412), which wiped a CLI refactor that had already landed on main before this branch was cut and was legitimately inherited by the branch. Restore cli.py to current origin/main so the PR makes no cli.py changes at all (the unrelated refactor stays, only the 3-line ask_vlm command-group mapping this PR added is removed). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ripping) The /v\d+$ stripping handled only a theoretical endpoint=".../v1" config that no other SDK method supports — create_note and every generated endpoint append /v1/... directly on self.endpoint, which is assumed to end at /device-api. Match that established pattern and drop the now-moot regex, re import, and the version-duplication regression test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
groundlight/__init__.py re-exports generated models via `from model import *`, which mypy can't resolve to concrete attributes — so `from groundlight import VlmVerification` failed type checking. Import it from `model` directly, matching the convention in the other unit tests (Detector, ImageQuery, etc.). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The CLI auto-registers every public ExperimentalApi method and looks up its group in _COMMAND_GROUPS with a hard [] — so a new method with no entry crashes the entire CLI (KeyError) at startup, breaking all CLI tests. Since we're not exposing this in the CLI (takes image media, billable per call), add it to the purpose-built _CLI_EXCLUDED_METHODS set rather than _COMMAND_GROUPS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Expose ask_vlm_verify under a new "VLM Verification" group instead of excluding it. The CLI auto-rewrites the media Union to str, so the command takes a single image filepath + query (the Python API keeps full multi-image support). Adds a registration test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
Updated the PR with OpenAPI generated SDK and moved |
ask_vlm_verify now goes through the generated client (self.vlm_verifications_api .submit_vlm_verification) and validates the response into the pydantic model — the same pattern as submit_image_query — instead of a hand-rolled requests.post. This drops the manual URL building, auth header, SSL handling, and JSON parsing. The endpoint is modeled as multipart/form-data with a `media` file array, so the generated files_parameters needs each file to have a `.name` (to derive the part filename + content-type). Rather than a new wrapper class, give the SDK's existing ByteStreamWrapper an optional `name`. (submit_image_query needs no name because it sends the image as a single binary body, not a multipart file field.) Tests now mock the transport (RESTClientObject.request) so they exercise the real multipart assembly + response parsing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Matches the "Image 1", "Image 2", ... labels the server presents to the model. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
What
Adds
ExperimentalApi.ask_vlm_verify(media, query, model_id)for VLM-based alert verification. CallsPOST /v1/vlm-verificationson the Groundlight cloud (AWS Bedrock) and returns a generatedVlmVerificationmodel with a nestedresult(verdict=YES/NO/UNSURE,confidence,reasoning) andcost(input_tokens,output_tokens,total_cost_usd).Pairs with the janzu PR (zuuul#6519). No local inference — the VLM runs entirely in the cloud.
Generated from the OpenAPI spec
Per review feedback, the response model is not hand-written — it follows the SDK codegen process:
VlmVerification,VlmVerificationResult,VlmVerificationCost,VerdictEnum) were added tospec/public-api.yaml, synced from zuuul#6519's regeneratedpublic-api.yaml.make generateregenerates bothgenerated/model.py(the PydanticVlmVerificationthe SDK returns) and thegroundlight_openapi_clientpackage (vlm_verifications_api.py+ model classes).Experimental surface
The method lives on
ExperimentalApi, not the coreGroundlightclient. It returns the generatedVlmVerificationmodel directly.API
mediaaccepts 1–8 images (single or list). Accepts numpy BGR arrays, PIL Images, bytes, BytesIO/BufferedReader, or filename strings — encoded via the existingparse_supported_image_typesutility.multipart/form-data:mediaparts asimage/jpegfiles;queryandmodel_idas form fields (not URL params, so the prompt never leaks into access logs).model_idis a friendly alias (e.g."gpt-5.4","claude-sonnet-4.5"). The server maps it to the real Bedrock model id and returns 400 on an unknown alias. Defaults to the server-configured default.VlmVerificationsApi.submit_vlm_verification) — the same pattern assubmit_image_query— and the response is validated into the generatedVlmVerificationmodel viamodel_validate. The generated multipart helper derives eachmediapart's filename/content-type from the file's.name, soByteStreamWrapper(the SDK's existing image-bytes carrier) gains an optionalname.Usage
Changes
spec/public-api.yaml: add/v1/vlm-verificationspath +VlmVerification/VlmVerificationResult/VlmVerificationCost/VerdictEnumschemasgenerated/: regenerated client (generated/model.py+groundlight_openapi_clientVLM api/model files) viamake generatesrc/groundlight/experimental_api.py:ask_vlm_verifymethod (calls the generatedVlmVerificationsApi)src/groundlight/images.py:ByteStreamWrappergains an optionalname(needed for multipart file parts)src/groundlight/cli.py: registerask_vlm_verifyas an experimental CLI command under a new "VLM Verification" group (groundlight exp ask-vlm-verify MEDIA QUERY). Note: only a single image is supported via the CLI for now — themediaUnion collapses to astrfilepath. The Python API (ExperimentalApi.ask_vlm_verify) keeps full 1–8 image support.test/unit/test_ask_vlm_verify.py+test/unit/test_cli.py: unit tests (mocked HTTP) and a CLI registration testThis PR makes no changes to
client.pyor__init__.py.Testing
The endpoint isn't deployed yet, so tests are mocked (no live server). Coverage:
VlmVerification(nestedresult.verdict/result.confidence,cost.total_cost_usd)mediapartquery/model_idsent as form fields, not URL params (prompt must not appear in logs)ValueErrorbefore any network call/device-api/v1/vlm-verifications)Follow-ups before merge / deploy
spec/public-api.yamlif the server's final schema changes.🤖 Generated with Claude Code