Skip to content

Add Environment Validator TSG: AzStackHci_DNS_ExternalDnsResolution (External DNS Resolution)#319

Open
1008covingtonlane wants to merge 2 commits into
Azure:mainfrom
1008covingtonlane:tsg-dns-external-dns-resolution
Open

Add Environment Validator TSG: AzStackHci_DNS_ExternalDnsResolution (External DNS Resolution)#319
1008covingtonlane wants to merge 2 commits into
Azure:mainfrom
1008covingtonlane:tsg-dns-external-dns-resolution

Conversation

@1008covingtonlane

Copy link
Copy Markdown
Collaborator

What

Adds an Environment Validator remediation guide for the dedicated external-DNS check AzStackHci_DNS_ExternalDnsResolution (also reported as AzStackHci_DNS_Test_External_Hostname_Resolution on current builds).

The check

This validator confirms each Azure Local node can resolve an external (public) DNS name. The dedicated DNS validator resolves management.azure.com, retries up to three times before failing, and lists each failing node as its own bullet with an (Attempt: n/3) suffix. It runs during pre-deployment readiness, deployment, add-node, and the pre-update health check, and it is Critical (a failure blocks a pending Azure Local update).

It is the successor of the legacy connectivity test AzStackHci_Connectivity_Test_Dns (which resolved microsoft.com). The root cause and fix are identical.

Design: focused + cross-linked

Rather than duplicate the full connectivity-DNS guide, this TSG is focused on what differs for the dedicated validator (queried hostname, retry/Attempt suffix, per-node bullet layout, result-name filter) and cross-links the companion Connectivity Test DNS TSG for the shared root cause, the full multi-entry-point discovery options, the per-node fan-out, and the DNS glossary. It is still self-sufficient for the core per-node fix (identify the management adapter, test each DNS server, set the correct servers or add a forwarder, verify). The companion link resolves once that guide merges.

Notes

  • Lint Grade A (H1 = canonical check name, severity, where-it-appears, verify-the-fix, balanced PowerShell fences).
  • Reads the on-box result from AdditionalData.Status/Detail and filters the result Name with a leading-and-trailing wildcard, and the verify step uses Invoke-SolutionUpdatePrecheck -SystemHealth + Get-SolutionUpdateEnvironment.
  • Prose follows the no spaced double-hyphen convention.
  • Internally validated end-to-end (Grade A) via the SRE embedded-test harness (inject a non-resolving DNS server, detect the failure, restore, revalidate). INTERNAL USE ONLY -- findings must be independently validated before action.

…External DNS Resolution)

Remediation guide for the dedicated external-DNS Environment Validator check
AzStackHci_DNS_ExternalDnsResolution (also reported as
AzStackHci_DNS_Test_External_Hostname_Resolution on current builds). This validator
confirms each node can resolve an external public DNS name (management.azure.com),
retries up to three times, and lists each failing node as a bullet.

It is the successor of the legacy connectivity test AzStackHci_Connectivity_Test_Dns;
the root cause and fix are identical, so this focused guide documents what differs
(the queried hostname, the retry/Attempt suffix, the per-node bullet layout, and the
result-name filter) and cross-links the Connectivity Test DNS TSG for the shared
full walkthrough and DNS glossary.

Adds the TSG markdown and a README index entry.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new Environment Validator troubleshooting guide (TSG) for the dedicated external-DNS readiness check AzStackHci_DNS_ExternalDnsResolution (also reported as AzStackHci_DNS_Test_External_Hostname_Resolution), which verifies each Azure Local node can resolve the public name management.azure.com. It fits into the existing TSG/EnvironmentValidator/ collection of community-driven supportability content used by CSS, engineering, and customers. The guide is intentionally focused on what differs from the legacy connectivity DNS test and cross-links a companion guide for shared background.

Changes:

  • Adds Troubleshooting-DNS-External-DNS-Resolution.md covering symptoms, where the failure appears, consequences, per-node remediation (re-point DNS client or add a forwarder), and verification steps.
  • Reads the on-box result from AdditionalData.Status/Detail, filters the result Name by wildcard, and verifies via Invoke-SolutionUpdatePrecheck -SystemHealth + Get-SolutionUpdateEnvironment.
  • Registers the new guide in the Environment Validator README.md table of contents.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
TSG/EnvironmentValidator/Troubleshooting-DNS-External-DNS-Resolution.md New TSG for the external-DNS validator; well-structured and consistent with sibling guides, but depends on a companion guide that does not yet exist and has a minor event-log filter-pattern gap.
TSG/EnvironmentValidator/README.md Adds the new TSG to the table of contents using the established relative-link format.

Comment on lines +54 to +55
> connectivity test and are documented once, in depth, in the companion TSG
> [Troubleshooting AzStackHci_Connectivity_Test_Dns](https://github.com/Azure/AzureLocal-Supportability/blob/main/TSG/EnvironmentValidator/Troubleshooting-Connectivity-Test-Dns.md).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in d3e70af. The guide is now self-sufficient: the per-node fan-out and a full DNS glossary are inline, and the event-log / portal discovery alternatives are covered inline, so no core content is delegated to the not-yet-merged connectivity guide. The companion is reduced to a single optional "Related guide" pointer for the legacy connectivity check, honestly labelled as a separate in-progress PR that this guide does not depend on.


Each row is one currently-failing DNS server on one node. On the Windows event log the
same record is written to `AzStackHciEnvironmentChecker` as Event ID 17205; filter its
`Name` the same way (`-like '*ExternalDnsResolution*'`) and read `AdditionalData.Detail`.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in d3e70af. The event-log guidance and the new per-node fan-out now filter BOTH result names (*ExternalDnsResolution* OR *Test_External_Hostname_Resolution*), consistent with the health-check-file query, so a build reporting the second name is not missed.

Addresses the two Copilot review findings on PR Azure#319:

1. Broken/pending companion links + delegated core content. The guide no longer
   depends on the not-yet-merged Connectivity Test DNS guide: the per-node fan-out
   and a full DNS glossary are now inline, and the event-log / portal discovery
   alternatives are covered inline. The companion is reduced to a single optional
   "Related guide" pointer for the legacy connectivity check, honestly labelled as a
   separate in-progress PR that this guide does not depend on.

2. Event-log filter gap. The event-log guidance and the per-node fan-out now match
   BOTH result names (*ExternalDnsResolution* OR *Test_External_Hostname_Resolution*),
   consistent with the health-check-file query, so a build reporting the second name
   is not missed.

Lint A; prose has no spaced clause separators; fences balanced.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants