Cloud-Native (Recommended)
Single-region multi-AZ deployment using managed Kubernetes services, managed PostgreSQL, and Velero backups. The recommended HA approach for most self-hosted DALP deployments.
Purpose: Describe the recommended cloud-native HA deployment pattern.
- Doc type: Reference
- Related: HA Overview, Hot-warm, Backup & Recovery
Single-region, multi-AZ deployment using managed services and Velero backups. This is the recommended approach for most deployments.
Architecture
Recovery metrics
| Metric | Target | Notes |
|---|---|---|
| RTO | 2–15 minutes | Automatic failover for most failures |
| RPO | Seconds–1 minute | Synchronous replication |
| RTT | 15–60 minutes | Including verification |
Setup and maintenance
| Task | Time estimate | Client role |
|---|---|---|
| Kubernetes cluster provisioning | 2–4 hours | Client platform engineer |
| Managed PostgreSQL setup | 1–2 hours | Client platform engineer |
| Velero installation and config | 2–4 hours | Client platform engineer |
| Backup verification | 2–4 hours | Client platform engineer |
| Monitoring and alerting | 2–4 hours | Client platform engineer |
| Documentation and runbooks | 4–8 hours | Client platform engineer |
| Total initial setup | 2–3 days | 1 client engineer |
| Activity | Frequency | Time per cycle |
|---|---|---|
| Backup verification | Weekly | 30 minutes |
| Helm chart updates | Monthly | 1–2 hours |
| DR drill / restore test | Quarterly | 4–8 hours |
| Security patching | Monthly | 2–4 hours |
| Capacity review | Quarterly | 2–4 hours |
| Monthly effort | 8–16 hours |
Team requirements
- Minimum: Part of platform team responsibilities (~0.25 FTE)
- Recommended: Dedicated on-call rotation for production incidents
Required skills: Kubernetes/OpenShift administration (intermediate), Helm chart management (basic), cloud provider managed services, basic PostgreSQL operations, Prometheus/Grafana monitoring.
Cloud provider configurations
AWS (EKS): EKS control plane multi-AZ by default, Auto Scaling Groups across AZs, RDS Multi-AZ, ElastiCache Multi-AZ, S3 (99.999999999% durability).
Azure (AKS): Zone-redundant control plane, Availability Zones for workers, Azure Database zone-redundant HA, Azure Cache zone redundancy, Blob Storage ZRS or GRS.
GCP (GKE): Regional multi-zone control plane, multi-zone node pool, Cloud SQL Regional HA, Memorystore Standard tier, Cloud Storage multi-regional.
OpenShift (OCP/OKD): Multi-master with etcd quorum, workers across failure domains, OpenShift Data Foundation (Ceph) for storage, router sharding for HA.
High Availability
HA and DR philosophy for self-hosted DALP deployments. Covers RTO/RPO/RTT definitions and a scenario selection guide to help you choose the right deployment pattern.
Hot-Warm (Active-Standby)
Active-standby deployment with warm validators and continuous database replication. Provides geographic redundancy with RTO of 30–180 minutes.