-
Notifications
You must be signed in to change notification settings - Fork 7
DNS name tracking blog #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
memodi
wants to merge
5
commits into
netobserv:main
Choose a base branch
from
memodi:dnsname-tracking
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
70 changes: 70 additions & 0 deletions
70
content/posts/2026-03-12-dns-name-tracking/dns-response-packet.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,119 @@ | ||
| --- | ||
| layout: :theme/post | ||
| title: "DNS name tracking with Network Observability" | ||
| description: "Overview of DNS name tracking feature" | ||
| tags: network,observability,DNS,loki,troubleshooting | ||
| authors: [memodi, jpinsonneau] | ||
| --- | ||
|
|
||
| Network Observability has long had a feature that reports the DNS latencies and | ||
| response codes for the DNS resolutions in your Kubernetes cluster. `DNSTracking` | ||
| feature can be simply enabled in FlowCollector config as below. | ||
|
|
||
| ```yaml | ||
| spec: | ||
| agent: | ||
| ebpf: | ||
| features: | ||
| - DNSTracking | ||
| ``` | ||
|
|
||
| In the most recent 1.11 release, a major enhancement was added to existing | ||
| `DNSTracking` feature to report DNS query names as well without any additional | ||
| configuration to the FlowCollector. | ||
|
|
||
| The current implementation captures DNS latencies, response codes, and query | ||
| names from DNS response packets. To understand this better, let's examine the | ||
| structure of a standard DNS response packet: | ||
|
|
||
|  | ||
|
|
||
| As you may have guessed DNS query name is being captured from the Question | ||
| section of a response packet. DNS resolution is the first step for most | ||
| application network requests in Kubernetes. In this blog, let us demonstrate how | ||
| having this information could help you troubleshoot configuration issues or | ||
| could help you identify DNS configuration issues and detect suspicious network | ||
| activity. | ||
|
|
||
| We're running an OpenShift cluster on AWS with a simple test setup: a `client` | ||
| pod making requests to an `nginx` service in a different namespace. The nginx | ||
| service runs in the `server` namespace, while the client pod runs in the | ||
| `client` namespace." | ||
|
|
||
| ```bash | ||
| while : ; do | ||
| curl nginx.server.svc:80/data/100K 2>&1 > /dev/null | ||
| sleep 5 | ||
| done | ||
| ``` | ||
|
|
||
| While the requests to fetch 100K object does succeed, can you spot the | ||
| configuration issue in the above curl command for the nginx requests that its | ||
| making? Let's look at what we do see in the flowlogs: | ||
|
|
||
|  | ||
|
|
||
| We see several requests failing due to `NXDOMAIN` response code and the ones | ||
| that succeed have query names `nginx.server.svc.cluster.local`. Since we | ||
| configured short DNS name `nginx.server.svc` in the curl command, the cluster | ||
| DNS service tries multiple search paths to find answer based on /etc/resolv.conf | ||
| search directive. | ||
|
|
||
| ```bash | ||
| cat /etc/resolv.conf | ||
| search server.svc.cluster.local svc.cluster.local cluster.local us-east-2.compute.internal | ||
| nameserver 172.30.0.10 | ||
| options ndots:5 | ||
| ``` | ||
|
|
||
| Short DNS names for cluster services cause high load on the cluster DNS service | ||
| resulting in higher latencies, negative caching (where DNS servers cache | ||
| negative responses—like NXDOMAIN-until the TTL expires), and increased DNS | ||
| traffic. This negative impact can be prevented by using Fully Qualified Domain | ||
| Name (FQDN) in the requests. After updating the hostname to | ||
| `nginx.server.svc.cluster.local.` (note the trailing dot) in the curl requests, | ||
| we are not seeing any NXDOMAINS and reduced unnecessary DNS traffic in our | ||
| cluster. You can imagine the performance impact if such configuration issue | ||
| propagated to hundreds of services in your cluster. | ||
|
|
||
|  | ||
|
|
||
| The web console also has new Overview Panels to fetch top 5 DNS names which are | ||
| queried most: | ||
|
|
||
|  | ||
|
|
||
| Note that `pod` filters are removed in above image since the DNS traffic is | ||
| reported by the DNS `Service` in the cluster. This visualization can identify | ||
| suspicious domain name activities in your cluster and with table view you can | ||
| narrow down to the resource where such activities could be coming from. | ||
|
|
||
| While DNS name decoding has great use-cases in identifying and troubleshooting | ||
| issues, it comes with some caveats to favor performance. This feature isn't | ||
| supported with Prometheus as datastore since storing DNS names as metric values | ||
| could cause high cardinality. That means, if you're looking to use this feature | ||
| you must use Loki as your datasource. We're actively working to measure the | ||
| performance impact and expose DNS names as Prometheus metrics, though. | ||
|
|
||
| Captured DNS names will be truncated at 32 bytes to balance the | ||
| netobserv-ebpf-agent's memory utilization, however this length should cover most | ||
| practical scenarios. | ||
|
|
||
| DNS name tracking currently does not support DNS compression pointers — a | ||
| space-saving technique defined in | ||
| ([RFC 1035 section 4.1.4](https://www.rfc-editor.org/rfc/rfc1035.html#section-4.1.4)). | ||
| While this is a known limitation, it has minimal practical impact since | ||
| compression is rarely used in the Question section where queries are tracked. | ||
| Compression pointers are predominantly used in Answer sections to reference the | ||
| queried domain name. | ||
|
|
||
| To conclude, in combination with other Network Observability features such as | ||
| built in alerts for overall network health, DNS name tracking will help identify | ||
| real world issues faster. Before wrapping up, we'd like to acknowledge Amogh | ||
| Rameshappa Devapura, Mike Fiedler, Joel Takvorian for reviewing this blog. | ||
|
|
||
| If you'd like to share feedback or engage with us, feel free to ping us on | ||
| [slack](https://cloud-native.slack.com/archives/C08HHHDA9ND) or drop in a | ||
| [discussion](https://github.com/orgs/netobserv/discussions). | ||
|
|
||
| Thank you for reading! | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On cardinality, maybe we can tell that we're currently evaluating the impact - see my comment here - I think eventually we can add that to the metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a statement