Hot-Cold (Backup-Based Recovery)
Backup-based disaster recovery with cold standby cluster. Lowest cost option with significant RPO (4–24 hours) and RTO (8–72 hours). Use only when cost constraints outweigh availability requirements.
Purpose: Describe the hot-cold backup-based recovery pattern and its tradeoffs.
- Doc type: Reference
- Related: HA Overview, Hot-warm, Backup & Recovery
Hot-cold deployments accept significant data loss (RPO 4–24 hours) and long recovery times (RTO 8–72 hours). Only use when cost constraints outweigh availability requirements.
Backup-based disaster recovery with a cold standby cluster that is provisioned on-demand.
When to use hot-cold
- Cost is the primary constraint — only one cluster runs continuously
- The system is not business-critical (development, staging, non-production)
- Acceptable RPO is 4+ hours and RTO is 8+ hours
- Data can be rebuilt by replaying blockchain events (re-indexing is acceptable)
Do not use hot-cold for production financial systems where data loss or multi-hour outages are unacceptable.
Architecture
Recovery metrics
| Metric | Target | Notes |
|---|---|---|
| RTO | 8–72 hours | Highly variable by chain size |
| RPO | 4–24 hours | Depends on backup frequency |
| RTT | 12–96 hours | Including resync and reindex |
Recovery time breakdown
| Phase | Duration | Notes |
|---|---|---|
| Cluster provisioning | 15–60 minutes | If not pre-provisioned |
| Operator installation | 5–15 minutes | Operator readiness checks |
| PostgreSQL restore | 30–120 minutes | Depends on database size |
| Velero restore | 15–60 minutes | Depends on resources |
| Blockchain resync | 4–48+ hours | Depends on chain size |
| Indexer rebuild | 1–24 hours | Depends on events to process |
| Total RTT | 8–72 hours | Highly variable |
Cost advantage
Hot-cold is significantly cheaper than other patterns:
- Only 1 active cluster running
- Cold cluster provisioned on-demand (pay only during recovery)
- Minimal cross-region networking costs
Trade-off: Longer recovery time and potential data loss.
Setup and maintenance
| Task | Time estimate | Client role |
|---|---|---|
| Active cluster provisioning | 4–8 hours | Client platform engineer |
| Cold cluster IaC preparation | 4–8 hours | Client platform engineer |
| CloudNativePG setup | 4–8 hours | Client platform engineer |
| PostgreSQL backup configuration | 4–8 hours | Client platform engineer |
| Velero installation | 2–4 hours | Client platform engineer |
| Recovery script development | 1 day | Client platform engineer |
| Recovery procedure testing | 1–2 days | Client platform engineer |
| Total initial setup | 1–1.5 weeks | 1 client engineer |
| Activity | Frequency | Time per cycle |
|---|---|---|
| Backup verification | Daily | 15 minutes |
| Backup integrity testing | Weekly | 1–2 hours |
| Helm chart updates | Monthly | 1–2 hours |
| Recovery drill (full) | Quarterly | 1–2 days |
| Cold cluster IaC validation | Quarterly | 2–4 hours |
| Security patching | Monthly | 2–4 hours |
| Monthly effort | 10–20 hours |
Hot-Warm (Active-Standby)
Active-standby deployment with warm validators and continuous database replication. Provides geographic redundancy with RTO of 30–180 minutes.
Hot-Hot (Active-Active)
Multi-cluster active-active deployment for consortium and public blockchain networks. Provides the lowest RTO (1–10 minutes) at the highest operational cost.