SettleMint
ArchitectureSelf-HostingHigh Availability

Hot-Cold (Backup-Based Recovery)

Backup-based disaster recovery with cold standby cluster. Lowest cost option with significant RPO (4–24 hours) and RTO (8–72 hours). Use only when cost constraints outweigh availability requirements.

Purpose: Describe the hot-cold backup-based recovery pattern and its tradeoffs.


Hot-cold deployments accept significant data loss (RPO 4–24 hours) and long recovery times (RTO 8–72 hours). Only use when cost constraints outweigh availability requirements.

Backup-based disaster recovery with a cold standby cluster that is provisioned on-demand.

When to use hot-cold

  • Cost is the primary constraint — only one cluster runs continuously
  • The system is not business-critical (development, staging, non-production)
  • Acceptable RPO is 4+ hours and RTO is 8+ hours
  • Data can be rebuilt by replaying blockchain events (re-indexing is acceptable)

Do not use hot-cold for production financial systems where data loss or multi-hour outages are unacceptable.

Architecture

Rendering diagram...

Recovery metrics

MetricTargetNotes
RTO8–72 hoursHighly variable by chain size
RPO4–24 hoursDepends on backup frequency
RTT12–96 hoursIncluding resync and reindex

Recovery time breakdown

PhaseDurationNotes
Cluster provisioning15–60 minutesIf not pre-provisioned
Operator installation5–15 minutesOperator readiness checks
PostgreSQL restore30–120 minutesDepends on database size
Velero restore15–60 minutesDepends on resources
Blockchain resync4–48+ hoursDepends on chain size
Indexer rebuild1–24 hoursDepends on events to process
Total RTT8–72 hoursHighly variable

Cost advantage

Hot-cold is significantly cheaper than other patterns:

  • Only 1 active cluster running
  • Cold cluster provisioned on-demand (pay only during recovery)
  • Minimal cross-region networking costs

Trade-off: Longer recovery time and potential data loss.

Setup and maintenance

TaskTime estimateClient role
Active cluster provisioning4–8 hoursClient platform engineer
Cold cluster IaC preparation4–8 hoursClient platform engineer
CloudNativePG setup4–8 hoursClient platform engineer
PostgreSQL backup configuration4–8 hoursClient platform engineer
Velero installation2–4 hoursClient platform engineer
Recovery script development1 dayClient platform engineer
Recovery procedure testing1–2 daysClient platform engineer
Total initial setup1–1.5 weeks1 client engineer
ActivityFrequencyTime per cycle
Backup verificationDaily15 minutes
Backup integrity testingWeekly1–2 hours
Helm chart updatesMonthly1–2 hours
Recovery drill (full)Quarterly1–2 days
Cold cluster IaC validationQuarterly2–4 hours
Security patchingMonthly2–4 hours
Monthly effort10–20 hours

On this page