Is DNS failover instant?

No. Authoritative DNS can change an answer quickly, but recursive resolvers cache old answers until TTL expiry, and some clients cache too. Low TTLs reduce the window but do not eliminate it.

What TTL should I use for DNS failover?

For active failover records, 60-300 seconds is common. Lower TTLs increase authoritative query volume and may be clamped by resolvers. Use 300 seconds unless you have a clear need and monitoring for the extra query load.

What should health checks test?

Test the user-visible service, not just ping. For a web API, check HTTPS, certificate validity, response code, and a lightweight dependency path. For mail, check SMTP readiness and MX target reachability. Avoid checks that are so deep they fail during minor dependency blips.

Can DNS failover replace a load balancer?

No. DNS failover is coarse and cached. Load balancers make per-request decisions and can remove backends immediately. DNS failover is useful for region, provider, and disaster-recovery switching, not single-request balancing.

How does secondary DNS fit into failover?

Secondary DNS keeps authoritative service available if one DNS provider fails. It does not automatically fail over your application endpoints unless the zone data or policies also change. Use secondary DNS for DNS-provider resilience and record failover for application resilience.

AutomationIntermediate

DNS Failover Design Patterns - Health Checks, TTLs, and Multi-Provider Resilience

Learn practical DNS failover patterns for active-passive, active-active, regional failover, multi-provider DNS, and disaster recovery.

Updated 5 May 2026

Start free Browse guides

Answer snapshot

DNS failover changes answers or delegation when an endpoint, region, or provider fails. The main patterns are active-passive records, active-active regional pools, GeoDNS fallback, secondary DNS/multi-provider delegation, and manual disaster-recovery cutovers. DNS failover is constrained by TTLs and resolver caching, so it is not instant. Design health checks carefully, keep TTLs realistic, avoid flapping, and pair DNS failover with application retries.

What you'll learn

Choose the right DNS failover pattern for an application
Understand TTL and resolver-cache constraints
Design health checks that avoid false failovers
Combine DNS failover with secondary DNS and application resilience

DNS failover is the practice of changing DNS behaviour when something breaks: an endpoint, a region, a network, or an entire DNS provider.

It is powerful because every internet client already uses DNS. It is limited because DNS is cached. A correct design respects both facts.

What DNS Failover Can and Cannot Do

DNS failover can:

Move new resolver lookups away from unhealthy endpoints
Route regions to fallback regions
Support disaster-recovery cutovers
Keep DNS available across providers with secondary DNS
Reduce manual incident work when health checks are reliable

DNS failover cannot:

Instantly move every active user
Override cached answers before TTL expiry
Make unsafe application failover safe
Replace database replication or session strategy
Replace per-request load balancing

Pattern 1: Active-Passive Endpoint Failover

One endpoint serves traffic. Another is standby.

api.example.com.  60  IN  CNAME  api-primary.example.net.

On failure:

api.example.com.  60  IN  CNAME  api-standby.example.net.

Use when:

The standby can serve the same traffic
Recovery time of minutes is acceptable
Writes are replicated or paused safely

Risks:

Cached primary answers continue until TTL expiry
Standby may be cold or under-tested
Split-brain if primary recovers but clients are mixed

Pattern 2: Active-Active Regional Pools

Multiple regions serve traffic at the same time.

api.example.com.  300  IN  CNAME  api-eu.example.com.
api.example.com.  300  IN  CNAME  api-us.example.com.

Better implementations use GeoDNS or latency-based routing to return region-appropriate answers.

Use when:

Regions are independently healthy
Data and sessions are region-safe
You want lower latency and resilience

Risks:

More application complexity
Harder data consistency model
Region-specific incidents can affect only some users

Pattern 3: GeoDNS Fallback

Regional users normally get regional endpoints, with fallback rules.

User region	Normal answer	Fallback
EU	`api-eu.example.com`	`api-us.example.com`
US	`api-us.example.com`	`api-eu.example.com`
APAC	`api-apac.example.com`	`api-us.example.com`

Use when:

Regional latency matters
Regional outages should drain to another region
Compliance allows the fallback path

Compliance-sensitive systems need explicit fallback rules. "Fail EU to US" may be unacceptable for some data classes.

Pattern 4: Multi-Provider DNS

List nameservers from more than one DNS provider at the registrar. One provider is primary; another is secondary via AXFR/IXFR.

example.com.  NS  ns1.dnscale.eu.
example.com.  NS  ns2.other-provider.net.

Use when:

DNS-provider outage is a business risk
You need independent authoritative networks
You can keep zone data synchronized

This pattern is covered in Primary DNS vs Secondary DNS and Multi-provider DNS deployment.

Pattern 5: Manual Disaster-Recovery Cutover

Some systems should not auto-fail over. A manual DNS cutover may be safer.

Use manual cutover when:

Data recovery point matters more than speed
Failover can cause split-brain writes
Human validation is required
Legal/compliance review is needed before moving regions

Prepare the DNS pieces before the incident:

Low-enough TTLs on DR names
Standby records pre-created
Runbook with exact commands
Access to registrar and DNS provider
Rollback plan

Health Check Design

Bad health checks cause bad failovers.

Check:

DNS target resolves
TCP/TLS connection works
Certificate is valid and not expired
HTTP status is expected
A lightweight dependency path works
Response latency is below a threshold

Avoid:

ICMP-only checks for web services
Deep checks that fail during harmless dependency noise
Single-probe locations
No hysteresis before failover or recovery

TTL Strategy

Record type	Suggested TTL
Active failover alias	60-300 seconds
Regional routing records	300 seconds
Stable MX records	1800-3600 seconds
NS delegation at registrar	Often controlled by parent zone; plan for hours

Remember: NS delegation changes are slower than record changes because parent-zone and resolver caches are involved. For fast application failover, change records inside the already-delegated zone, not registrar delegation.

Anti-Flapping Controls

Failover systems need dampening:

Require multiple failed checks before removal
Require sustained recovery before re-adding
Set minimum time between state changes
Use weighted ramp-up after recovery
Alert humans on every automatic failover

Flapping is worse than a clean outage because it creates inconsistent client behaviour.

Frequently asked questions

Is DNS failover instant?: No. Authoritative DNS can change an answer quickly, but recursive resolvers cache old answers until TTL expiry, and some clients cache too. Low TTLs reduce the window but do not eliminate it.
What TTL should I use for DNS failover?: For active failover records, 60-300 seconds is common. Lower TTLs increase authoritative query volume and may be clamped by resolvers. Use 300 seconds unless you have a clear need and monitoring for the extra query load.
What should health checks test?: Test the user-visible service, not just ping. For a web API, check HTTPS, certificate validity, response code, and a lightweight dependency path. For mail, check SMTP readiness and MX target reachability. Avoid checks that are so deep they fail during minor dependency blips.
Can DNS failover replace a load balancer?: No. DNS failover is coarse and cached. Load balancers make per-request decisions and can remove backends immediately. DNS failover is useful for region, provider, and disaster-recovery switching, not single-request balancing.
How does secondary DNS fit into failover?: Secondary DNS keeps authoritative service available if one DNS provider fails. It does not automatically fail over your application endpoints unless the zone data or policies also change. Use secondary DNS for DNS-provider resilience and record failover for application resilience.

Related guides

Ready to manage your DNS with confidence?

DNScale provides anycast DNS hosting with a global network, real-time analytics, and an easy-to-use API.

Start free

What DNS Failover Can and Cannot Do

Pattern 1: Active-Passive Endpoint Failover

Pattern 2: Active-Active Regional Pools

Pattern 3: GeoDNS Fallback

Pattern 4: Multi-Provider DNS

Pattern 5: Manual Disaster-Recovery Cutover

Health Check Design

TTL Strategy

Anti-Flapping Controls

Related Guides

Frequently asked questions

Related guides

Ready to manage your DNS with confidence?