Hot-Warm (Active-Standby)
Active-standby deployment with warm validators and continuous database replication. Provides geographic redundancy with RTO of 30–180 minutes.
Purpose: Describe the hot-warm active-standby deployment pattern.
- Doc type: Reference
- Related: HA Overview, Cloud-native, Hot-cold
Active-standby deployment with warm validators and continuous database replication across two clusters.
Architecture
When to use hot-warm
- Acceptable RTO of 30–180 minutes
- Regulatory requirements for geographic redundancy
- Consortium networks where validator keys can be pre-staged
- Cost optimization compared to full hot-hot
Recovery metrics
| Metric | Target | Notes |
|---|---|---|
| RTO | 30–180 minutes | Depends on automation level |
| RPO | 5–60 minutes | Based on replication lag |
| RTT | 1–6 hours | Including validation and testing |
Important: Failover is manual and requires trained operator availability. RTO depends on staff availability and time zones. Regular drills are required to keep procedures current.
Setup and maintenance
| Task | Time estimate | Client role |
|---|---|---|
| Two cluster provisioning | 1 day | Client platform engineer |
| Network connectivity setup | 4–8 hours | Client platform engineer |
| CloudNativePG setup (two clusters) | 1 day | Client platform engineer |
| PostgreSQL primary and replica config | 1–2 days | Client platform engineer |
| Replication verification | 4–8 hours | Client platform engineer |
| Velero installation (two clusters) | 4–8 hours | Client platform engineer |
| Warm validator configuration | 1 day | Client platform engineer |
| Key management setup | 4–8 hours | Client security engineer |
| Failover scripts and automation | 1–2 days | Client platform engineer |
| Failover drill and validation | 1 day | Client platform team |
| Total initial setup | 2–3 weeks | 1–2 client engineers |
| Activity | Frequency | Time per cycle |
|---|---|---|
| Replication lag monitoring | Daily | 15 minutes |
| Standby health verification | Daily | 15 minutes |
| Backup verification | Weekly | 1 hour |
| Helm chart updates (2 clusters) | Monthly | 2–4 hours |
| Failover drill (full) | Quarterly | 1 day |
| Security patching (2 clusters) | Monthly | 4–8 hours |
| Monthly effort | 25–40 hours |
Team requirements
- Minimum: 0.75–1 FTE dedicated platform engineer
- Recommended: 1–1.5 FTE with on-call rotation
- Critical: Documented failover procedure executable by on-call staff
Cloud-Native (Recommended)
Single-region multi-AZ deployment using managed Kubernetes services, managed PostgreSQL, and Velero backups. The recommended HA approach for most self-hosted DALP deployments.
Hot-Cold (Backup-Based Recovery)
Backup-based disaster recovery with cold standby cluster. Lowest cost option with significant RPO (4–24 hours) and RTO (8–72 hours). Use only when cost constraints outweigh availability requirements.