Should I use my cloud provider's DNS or a third-party managed DNS?

Use your cloud provider's DNS for internal/private zones (Route 53 Private Zones, Cloud DNS private zones) — these are tightly integrated with VPC routing. Use a third-party provider like DNScale for public DNS so a cloud-wide outage at AWS or GCP does not also take down your DNS. The October 2025 AWS Route 53 outage made this lesson concrete: customers whose DNS was at AWS were unreachable even on healthy infrastructure elsewhere.

What TTL should I use for cloud-hosted services?

60–300 seconds for anything where the IP can change (load balancer endpoints, autoscaled services, ephemeral instances). 3600+ for stable infrastructure with fixed IPs. Always lower TTLs 24–48 hours before a planned migration so resolvers refresh quickly during cutover.

How do I keep DNS in sync with autoscaling instances?

Don't put autoscaled instance IPs directly in DNS. Put them behind a load balancer or service-discovery layer (Consul, Kubernetes service, AWS Cloud Map) and DNS only needs to point at the stable endpoint. For Kubernetes specifically, use external-dns to sync Ingress/Service annotations to your DNS provider automatically.

Can I run a single DNS zone across AWS, GCP, and Azure?

Yes — use Terraform or DNSControl to define the zone once and push records to whichever providers you delegate to. For pure cloud DNS providers (Route 53, Cloud DNS), that means each provider holds an authoritative copy. For a third-party like DNScale serving the public zone, your cloud DNS providers can act as private/internal zones for VPC traffic only.

What is the difference between split-horizon DNS and multi-provider DNS?

Split-horizon (split-brain) DNS returns different answers to internal vs external resolvers — used for example.com to resolve to a private IP inside the VPC and a public IP outside. Multi-provider DNS publishes the same answers from two providers for redundancy. They solve different problems and are often combined: split-horizon inside the cloud, multi-provider for the public zone.

AutomationIntermediate

DNS for Cloud Infrastructure — Best Practices and Architecture

Learn cloud DNS best practices including service discovery, multi-cloud strategies, automation with Terraform, and TTL optimization for dynamic infrastructure.

DNScale Engineering· DNS operations teamUpdated 4 May 2026Reviewed by DNScale Operations

Start free Browse guides

Answer snapshot

Cloud DNS sits in the path of every microservice call, load balancer health check, and failover. Treat it as code: define records in Terraform or DNSControl, keep TTLs short (60–300s) for dynamic IPs and longer for stable services, separate environments by zone (prod.example.com, staging.example.com), and avoid lock-in by managing public DNS at a provider that is independent of your compute cloud. Multi-provider DNS prevents a cloud-wide DNS outage from making every region unreachable.

What you'll learn

Identify the role of DNS in service discovery, load balancing, and failover for cloud workloads
Structure DNS namespaces across development, staging, and production environments
Apply multi-cloud and multi-provider DNS patterns to avoid vendor lock-in
Automate DNS management with Terraform, DNSControl, and external-dns

Cloud infrastructure lives and dies by DNS. Every microservice call, every load balancer health check, every failover event depends on DNS resolving the right address at the right time. Yet DNS is often treated as an afterthought — configured manually, left with default TTLs, and tied to a single cloud provider.

This guide covers DNS architecture patterns for cloud environments, from basic service discovery to multi-cloud strategies that keep your infrastructure resilient and portable.

Cloud DNS request path

01Application or user asks for api.example.com

02Recursive resolver follows delegation to the authoritative provider

03DNScale answers from an anycast edge close to the resolver

04Client reaches the healthy cloud endpoint selected by DNS

The DNS layer decides which cloud endpoint traffic reaches before an application load balancer handles the request.

Why DNS Matters in Cloud Infrastructure

In traditional infrastructure, servers had static IPs and DNS was simple: point www at a fixed address and forget about it. Cloud infrastructure is different. IPs change when instances restart, services scale horizontally, and infrastructure spans multiple regions. DNS becomes the glue that holds everything together.

Service Discovery

When a web application needs to reach a database, it doesn't hardcode 10.0.3.47. It resolves db.internal.example.com. When the database moves to a new instance, you update the A record and every service finds the new location automatically.

Cloud-native service discovery extends this further. Tools like Consul and Kubernetes DNS create records dynamically as services start and stop. But even with these tools, external DNS records still need to point to the right ingress points.

Load Balancing

DNS-based load balancing distributes traffic by returning different A records for the same hostname. A query for api.example.com might return 203.0.113.10 one time and 203.0.113.11 the next, spreading requests across backend servers.

This works at a coarse level — DNS round-robin doesn't account for server load or health. For production traffic, pair DNS-based distribution with application-level load balancers. DNS handles geographic routing; the load balancer handles instance-level routing.

Failover

When a primary server goes down, DNS can redirect traffic to a standby. The key is TTL: if your A record has a TTL of 3600 seconds, it can take up to an hour before clients stop hitting the dead server. With a TTL of 60 seconds, failover happens within a minute.

Short TTLs enable fast failover but increase query volume on your authoritative DNS servers. For critical services, 60–300 seconds is a practical range that balances responsiveness with DNS load.

Cloud Provider DNS vs. External Managed DNS

Every major cloud provider includes a DNS service: AWS Route 53, Google Cloud DNS, Azure DNS. These integrate tightly with their respective platforms but come with trade-offs.

Cloud-Native DNS: Advantages

Tight integration — Route 53 can automatically create records for ALBs, CloudFront distributions, and other AWS resources
Health checks — Cloud DNS services often include health-check-based routing tied to their monitoring infrastructure
IAM integration — DNS changes go through the same permission model as your other cloud resources
Low latency — DNS queries from within the cloud network resolve faster when using the provider's own DNS

Cloud-Native DNS: Drawbacks

Vendor lock-in — Your DNS configuration is expressed in a provider-specific format, making migration painful
Single point of failure — If your cloud provider has a major outage, your DNS goes down with everything else
Multi-cloud complexity — Managing DNS across AWS Route 53 and Google Cloud DNS means duplicating configuration in two different systems
Limited record types — Some providers don't support all DNS record types or advanced features like DNSSEC

External Managed DNS: When It Makes Sense

Using an external DNS provider like DNScale decouples your DNS from any single cloud provider. This is the right call when:

You run infrastructure across multiple clouds (or cloud plus on-premises)
DNS uptime is critical and you want independence from cloud provider outages
You need features your cloud provider doesn't offer (anycast, multi-provider failover, advanced DNSSEC)
You want a single pane of glass for DNS across all environments

Where each DNS layer belongs

Private zones

Inside each cloud

VPC-only service discovery, private endpoints, and cloud IAM workflows.

Public zone

DNScale

Authoritative public answers, anycast delivery, DNSSEC, and cloud-independent failover.

Automation

Terraform or DNSControl

One reviewed source of truth for records, delegation, TTLs, and provider sync.

Use cloud-native DNS where it is coupled to private networking. Keep public authoritative DNS outside the clouds that depend on it.

For a deeper comparison, see Managed DNS vs. Self-Hosted DNS.

DNS Patterns for Cloud Environments

Subdomain Delegation for Environments

One of the most effective patterns in cloud DNS is using subdomains to separate environments. Instead of managing entirely different domains for dev, staging, and production, delegate subdomains to different DNS zones:

example.com           → Production zone
dev.example.com       → Development zone
staging.example.com   → Staging zone

Set up delegation with NS records in the parent zone:

# In the example.com zone, delegate dev to its own nameservers
dev.example.com.    86400  IN  NS  ns1.dnscale.eu.
dev.example.com.    86400  IN  NS  ns2.dnscale.eu.

Each environment gets its own zone with independent records. Development teams can modify dev.example.com freely without risking production DNS. Verify the delegation is working:

dig NS dev.example.com +short
# ns1.dnscale.eu.
# ns2.dnscale.eu.

This pattern also lets you apply different access controls per environment — a junior developer can have full access to dev.example.com without touching production records.

Split-Horizon DNS

Split-horizon DNS returns different answers depending on where the query originates. Internal users querying app.example.com get a private IP (10.0.1.50), while external users get the public-facing IP (203.0.113.50).

This is common in cloud environments where services need to communicate over private networks internally but remain accessible externally:

# Internal view
app.example.com.  300  IN  A  10.0.1.50
 
# External view
app.example.com.  300  IN  A  203.0.113.50

Cloud providers implement this through private DNS zones (AWS Route 53 private hosted zones, GCP Cloud DNS private zones). For external DNS providers, you can achieve a similar effect by using different subdomains: app.internal.example.com for private resources and app.example.com for public ones.

Health-Check-Based Failover

Cloud DNS services can monitor endpoint health and remove unhealthy records from responses automatically. Here is the pattern:

Define a primary A record pointing to your main server
Define a secondary A record pointing to your failover server
Attach health checks to both endpoints
DNS returns only healthy endpoints

api.example.com.  60  IN  A  203.0.113.10   ; Primary (healthy → returned)
api.example.com.  60  IN  A  203.0.113.20   ; Secondary (returned if primary fails)

Keep TTLs low (60 seconds) on records with health-check failover. A long TTL defeats the purpose — clients will keep using the cached record long after the health check has marked the endpoint as down.

Blue-Green Deployments with DNS

Blue-green deployments use DNS to switch traffic between two identical environments:

Blue (current production) runs at 203.0.113.10
Green (new version) is deployed and tested at 203.0.113.20
Update the DNS record from blue to green
Traffic shifts as DNS caches expire

# Before cutover
dig app.example.com +short
203.0.113.10
 
# Update the A record via DNScale API or Terraform
# After TTL expires, traffic goes to green
dig app.example.com +short
203.0.113.20

For blue-green to work smoothly, lower the TTL well before the cutover. Drop it from 3600 to 60 seconds a day in advance, perform the switch, then raise the TTL back afterward. This approach also works with CNAME records pointing to load balancer hostnames.

Multi-Cloud DNS Strategy

Running infrastructure across AWS, GCP, and Azure is increasingly common, but each cloud has its own DNS service with its own API. An external DNS provider eliminates this fragmentation.

Avoid Vendor Lock-in

If all your DNS is in Route 53 and you want to move a service to GCP, you need to either keep managing some records in Route 53 or migrate everything. With an external provider, your DNS is independent of where the infrastructure runs:

# Same DNS configuration regardless of cloud provider
resource "dnscale_record" "api_aws" {
  zone_id = dnscale_zone.main.id
  name    = "api-us"
  type    = "A"
  content = "203.0.113.10"    # AWS instance
  ttl     = 300
}
 
resource "dnscale_record" "api_gcp" {
  zone_id = dnscale_zone.main.id
  name    = "api-eu"
  type    = "A"
  content = "198.51.100.10"   # GCP instance
  ttl     = 300
}

Geographic Routing

Use DNS to route users to the nearest cloud region. A user in Europe resolves api.example.com to the GCP Frankfurt instance, while a user in the US resolves it to the AWS us-east-1 instance. DNScale's anycast network handles this automatically — see Multi-Provider DNS Deployment for redundancy patterns across providers.

Multi-Provider Redundancy

For critical domains, serve DNS from multiple providers simultaneously. If one provider goes down, the other keeps serving. Set NS records at your registrar pointing to nameservers from both providers. For a full walkthrough with Terraform and DNSControl, see Multi-Provider DNS Deployment.

TTL Strategies for Cloud

Cloud infrastructure changes more frequently than traditional setups, which means TTL strategy matters more. For a comprehensive guide on TTL values, see DNS TTL Best Practices.

Short TTLs for Dynamic Resources

Resources that change frequently — auto-scaling groups, container IPs, failover targets — need short TTLs:

Resource Type	Recommended TTL	Reason
Auto-scaling instances	60s	IPs change with scale events
Failover targets	60s	Fast cutover on failure
Blue-green deployments	60s (during cutover)	Minimize stale caches
Container/pod IPs	30–60s	Pods are ephemeral

Long TTLs for Stable Resources

Not everything in the cloud changes. Static assets, MX records, and stable load balancer endpoints benefit from longer caching:

Resource Type	Recommended TTL	Reason
CDN endpoints	3600s	Rarely change
MX records	3600–86400s	Mail server changes are planned
NS records	86400s	Delegation should be stable
TXT records (SPF, DKIM)	3600s	Infrequent changes

Automating DNS with Infrastructure as Code

Manual DNS management does not scale. A single typo in a record can take down a service, and there is no audit trail when someone edits a record through a web dashboard. Infrastructure as Code brings version control, peer review, and automated deployments to DNS.

Terraform

The DNScale Terraform provider lets you manage zones and records alongside your cloud infrastructure:

resource "dnscale_zone" "main" {
  name   = "example.com"
  region = "eu"
}
 
resource "dnscale_record" "web" {
  zone_id = dnscale_zone.main.id
  name    = "www"
  type    = "A"
  content = aws_instance.web.public_ip
  ttl     = 300
}

The critical advantage is referencing cloud resource attributes directly. When aws_instance.web.public_ip changes, Terraform updates the DNS record automatically.

DNSControl

DNSControl takes a DNS-first approach with JavaScript configuration. It is particularly strong for multi-provider setups:

var DSP_DNSCALE = NewDnsProvider("dnscale");
 
D("example.com", REG_NONE,
  DnsProvider(DSP_DNSCALE),
 
  A("@", "203.0.113.10", TTL(300)),
  A("api", "203.0.113.20", TTL(60)),
  CNAME("www", "example.com.", TTL(3600)),
  MX("@", 10, "mail.example.com.", TTL(3600)),
END);

CI/CD for DNS

Automate DNS deployments with CI/CD pipelines. Every change goes through a pull request, gets reviewed, and is applied automatically on merge:

# .github/workflows/dns.yml
name: DNS Deploy
on:
  push:
    branches: [main]
    paths: ["dns/**"]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init && terraform apply -auto-approve
        working-directory: dns/
        env:
          TF_VAR_dnscale_api_key: ${{ secrets.DNSCALE_API_KEY }}

DNS for Containers and Kubernetes

Kubernetes has its own internal DNS (CoreDNS) for service discovery within the cluster. But services that need to be reachable from outside the cluster still need external DNS records.

external-dns is a Kubernetes controller that watches for Ingress and Service resources and automatically creates DNS records in your external provider:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api
  annotations:
    external-dns.alpha.kubernetes.io/hostname: api.example.com
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 80

When this Ingress is created, external-dns automatically creates an A record for api.example.com pointing to the ingress controller's IP. When the Ingress is deleted, the record is cleaned up. This closes the gap between Kubernetes internal DNS and your authoritative DNS. SRV records can also be used for service discovery in environments that support them.

Monitoring DNS in Cloud Environments

DNS failures in the cloud are often silent — services degrade slowly as cached records expire and new queries fail. Proactive monitoring catches issues before users notice.

What to Monitor

Resolution time — Are queries resolving within acceptable latency?
Record accuracy — Do records return the expected IPs? Use dig to verify:

dig api.example.com A +short
# Expected: 203.0.113.10

Propagation — After a change, how quickly do global resolvers see the update?
DNSSEC validation — If you use DNSSEC, verify signatures are valid
Zone expiry — Ensure SOA record serial numbers increment on updates

Alerting

Set up alerts for:

DNS resolution failures from multiple vantage points
Unexpected record changes (drift detection via terraform plan)
DNSSEC signature expiry warnings
Abnormal query volume spikes (potential DDoS)

Common Mistakes

Long TTLs Blocking Failover

A 3600-second TTL on a load balancer record means it takes up to an hour for clients to see a failover change. If your failover strategy depends on DNS, your TTL must be short enough to support it.

Single-Provider DNS

Running DNS on the same cloud provider as your infrastructure means a provider outage takes down both your services and your ability to redirect traffic. Consider multi-provider DNS for production domains, or at minimum use a primary/secondary DNS configuration.

No Automation

Manual DNS changes are error-prone and lack audit trails. Every DNS record should be defined in code, reviewed in a pull request, and applied through automation. See the Terraform provider guide or DNSControl guide to get started.

Ignoring DNS During DR Planning

Disaster recovery plans often focus on compute and data but forget DNS. If your primary region goes down, you need DNS records pointing to the DR region — and those records need to propagate fast enough to be useful.

Not Using Subdomain Delegation

Managing hundreds of records in a single flat zone makes it hard to apply per-environment access controls and increases the blast radius of mistakes. Delegate subdomains to separate zones for each environment.

Conclusion

DNS in cloud infrastructure is not a set-and-forget configuration. It is an active part of your architecture that enables service discovery, drives failover, and connects multi-cloud deployments. Treat DNS records with the same rigor as application code: version-controlled, reviewed, tested, and automated. Use short TTLs for dynamic resources, delegate subdomains for environment isolation, and avoid tying your DNS to a single cloud provider. With the right patterns and tooling, DNS becomes a reliable foundation rather than a fragile dependency.

Frequently asked questions

Should I use my cloud provider's DNS or a third-party managed DNS?: Use your cloud provider's DNS for internal/private zones (Route 53 Private Zones, Cloud DNS private zones) — these are tightly integrated with VPC routing. Use a third-party provider like DNScale for public DNS so a cloud-wide outage at AWS or GCP does not also take down your DNS. The October 2025 AWS Route 53 outage made this lesson concrete: customers whose DNS was at AWS were unreachable even on healthy infrastructure elsewhere.
What TTL should I use for cloud-hosted services?: 60–300 seconds for anything where the IP can change (load balancer endpoints, autoscaled services, ephemeral instances). 3600+ for stable infrastructure with fixed IPs. Always lower TTLs 24–48 hours before a planned migration so resolvers refresh quickly during cutover.
How do I keep DNS in sync with autoscaling instances?: Don't put autoscaled instance IPs directly in DNS. Put them behind a load balancer or service-discovery layer (Consul, Kubernetes service, AWS Cloud Map) and DNS only needs to point at the stable endpoint. For Kubernetes specifically, use external-dns to sync Ingress/Service annotations to your DNS provider automatically.
Can I run a single DNS zone across AWS, GCP, and Azure?: Yes — use Terraform or DNSControl to define the zone once and push records to whichever providers you delegate to. For pure cloud DNS providers (Route 53, Cloud DNS), that means each provider holds an authoritative copy. For a third-party like DNScale serving the public zone, your cloud DNS providers can act as private/internal zones for VPC traffic only.
What is the difference between split-horizon DNS and multi-provider DNS?: Split-horizon (split-brain) DNS returns different answers to internal vs external resolvers — used for example.com to resolve to a private IP inside the VPC and a public IP outside. Multi-provider DNS publishes the same answers from two providers for redundancy. They solve different problems and are often combined: split-horizon inside the cloud, multi-provider for the public zone.

Related guides

Ready to manage your DNS with confidence?

DNScale provides anycast DNS hosting with a global network, real-time analytics, and an easy-to-use API.

Start free