Skip to content

TSG: SBEHealth Test-Endpoint-Connectivity (Environment Validator remediation guide)#316

Open
1008covingtonlane wants to merge 3 commits into
Azure:mainfrom
1008covingtonlane:tsg-sbehealth-test-endpoint-connectivity
Open

TSG: SBEHealth Test-Endpoint-Connectivity (Environment Validator remediation guide)#316
1008covingtonlane wants to merge 3 commits into
Azure:mainfrom
1008covingtonlane:tsg-sbehealth-test-endpoint-connectivity

Conversation

@1008covingtonlane

Copy link
Copy Markdown
Collaborator

Summary

Adds a troubleshooting guide for the AzStackHci_SBEHealth_Test-Endpoint-Connectivity pre-update SBE health check, and indexes it in the Environment Validator README.

This check confirms the node can reach the hardware partner (OEM) Solution Builder Extension (SBE) manifest endpoint over HTTPS. It discovers the endpoint URL (from solution discovery, e.g. an aka.ms link) and makes an HTTPS request: a 200 is SUCCESS; a connection error, non-200, no response, or a redirect to a search engine is FAILURE. The check is Informational (it does not block the update), but a failure means the node cannot reach the SBE content source, almost always an outbound firewall / proxy block on 443.

The TSG covers:

  • The three outcomes and that a failure is a non-blocking connectivity warning (not a hard gate).
  • Where it appears: the portal Updates view and EventID 17205 on AzStackHciEnvironmentChecker (with the -like '*Test-Endpoint-Connectivity*' match).
  • How to reproduce from the node with Invoke-WebRequest (including surfacing the aka.ms redirect target via FinalUri).
  • Remediation: allow outbound HTTPS (443) to the endpoint host and its redirect target, fix any proxy block (including the 407 case), then re-run the pre-update health check with Invoke-SolutionUpdatePrecheck -SystemHealth.
  • Per-node scope, ownership (network / OEM), and when to escalate (including the "unable to determine endpoint" LCM case).

Grounded on the ASZ-EnvironmentValidator AzStackHciSBEHealth source (Test-AzStackHciSBEHealth + the Test-SBEEndpointConnectivity helper).

INTERNAL grade (tsg-forge): static lint A; 13-persona usability panel 5/5; tier-1 validated live (L4) by driving the real Test-SBEEndpointConnectivity probe with a reversible outbound-firewall-block inject on the real default SBE endpoint.

…diation guide)

New troubleshooting guide for the AzStackHci_SBEHealth_Test-Endpoint-Connectivity
pre-update SBE health check. The check confirms the node can reach the OEM Solution
Builder Extension manifest endpoint over HTTPS; a failure (Informational, non-blocking)
means the endpoint is unreachable, almost always an outbound firewall/proxy block on
443. Documents where it surfaces (portal Updates + EventID 17205), how to reproduce
with Invoke-WebRequest, the firewall/proxy remediation (including the aka.ms
redirect-target and proxy 407 cases), per-node scope, and the -SystemHealth re-run.
Adds the README index entry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new Environment Validator troubleshooting guide for the AzStackHci_SBEHealth_Test-Endpoint-Connectivity pre-update SBE health check and indexes it in the Environment Validator README. The guide explains that the check verifies the node can reach the hardware partner's Solution Builder Extension (SBE) manifest endpoint over HTTPS, describes the SUCCESS/FAILURE outcomes, shows how to find the failure on the portal and via the AzStackHciEnvironmentChecker event log, and walks through firewall/proxy remediation, re-running the pre-update check, and escalation paths. It fits the repository's existing pattern of per-check Environment Validator TSGs and complements the existing SBE firewall guides under TSG/SolutionExtension/.

Changes:

  • Adds Troubleshooting-SBEHealth-Test-Endpoint-Connectivity.md with metadata table, overview, event-log lookup, manual reproduction, remediation, and escalation guidance.
  • Adds a single index entry to the Environment Validator README.md.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
TSG/EnvironmentValidator/Troubleshooting-SBEHealth-Test-Endpoint-Connectivity.md New TSG for the SBE manifest endpoint connectivity check; content and structure follow existing guides, but the manual reproduction snippet uses an Invoke-WebRequest -PassThru/-OutFile form that fails on the default host shell.
TSG/EnvironmentValidator/README.md Adds a link to the new guide in the existing flat TSG list (placement and relative path are correct).

Notes: the external Learn links (azure-stack/hci/...) match the convention used across existing repo guides, and the -like '*Test-Endpoint-Connectivity*' event filter matches the documented AzStackHci_<Component>_<Test> name form. The main actionable item is the reproduction command, which will error out on Windows PowerShell 5.1 as written.

Comment thread TSG/EnvironmentValidator/Troubleshooting-SBEHealth-Test-Endpoint-Connectivity.md Outdated
1008covingtonlane and others added 2 commits July 4, 2026 15:17
…E firewall guides (PR review)

Address the two Copilot review comments:
- Step 2 reproduction command now uses only 'Invoke-WebRequest -UseBasicParsing -TimeoutSec 15'
  (dropped -OutFile/-PassThru). This is more portable and returns a response object on the
  default node shell (Windows PowerShell 5.1) so .StatusCode and
  .BaseResponse.ResponseUri.AbsoluteUri are populated, and it writes no temp file.
- Related now cross-links the internal SBE connectivity guides
  (SolutionExtension/Firewall-blocks-update-discovery.md and Firewall-blocks-SBE-validation.md),
  which detail the per-vendor aka.ms endpoints, the redirect target, and endpoint discovery.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…17205 query (PR review parity)

Same fix as the Endpoint-Matches-ModelSKU TSG (Copilot review on Azure#317): in the EventID
17205 / HealthCheckResult JSON the top-level Status and Severity are numeric enums and
Description is generic; the human-readable FAILURE/SUCCESS is AdditionalData.Status and the
specific message is AdditionalData.Detail (top-level Remediation is human-readable). The
on-box query now projects AdditionalData.Status + AdditionalData.Detail (+ Remediation), and
the prose reads the result from AdditionalData. Verified against the repo convention and a
live 17205 record (top Status=0 Int32, AdditionalData.Status='SUCCESS').

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants