Hot-Hot (Active-Active)
Multi-cluster active-active deployment for consortium and public blockchain networks. Provides the lowest RTO (1–10 minutes) at the highest operational cost.
Purpose: Describe both hot-hot deployment variants: consortium networks and public networks.
- Doc type: Reference
- Related: HA Overview, Hot-warm, Backup & Recovery
Two variants of active-active multi-cluster deployment, depending on whether you run consortium or public blockchain networks.
Consortium networks
Multi-cluster active-active deployment for consortium blockchain networks where you manage validators.
Architecture
Recovery metrics (consortium)
| Metric | Target | Notes |
|---|---|---|
| RTO | 1–10 minutes | Automatic failover via GSLB |
| RPO | Seconds–minutes | Async replication lag |
| RTT | 10–60 minutes | Including traffic rerouting |
Setup and maintenance (consortium)
| Task | Time estimate | Client role |
|---|---|---|
| Four cluster provisioning | 1–2 days | Client platform engineer |
| Network connectivity (peering or VPN) | 1–2 days | Client network engineer |
| CloudNativePG setup (four clusters) | 1–2 days | Client platform engineer |
| PostgreSQL distributed topology | 2–3 days | Client DBA or platform engineer |
| Failover automation and testing | 2–3 days | Client platform engineer |
| End-to-end DR drill | 1–2 days | Client platform team |
| Total initial setup | 3–5 weeks | 2–3 client engineers |
| Activity | Frequency | Time per cycle |
|---|---|---|
| Cross-cluster replication monitoring | Daily | 30 minutes |
| Backup verification (all clusters) | Weekly | 2 hours |
| Helm chart updates (4 clusters) | Monthly | 4–8 hours |
| DR drill / failover test | Quarterly | 1–2 days |
| Security patching (4 clusters) | Monthly | 1–2 days |
| Monthly effort | 40–60 hours |
Team requirements: 1.5–2 FTE dedicated platform engineers + 0.5 FTE DBA support, 24/7 on-call rotation.
Public networks
Multi-cluster active-active deployment for public blockchain networks where on-chain data can be re-derived.
Key differences from consortium
- No validators to manage — the chain is external
- Data is re-derivable — indexed data can be rebuilt by re-indexing
- Lower operational overhead than consortium
- Focus on minimizing user-facing downtime
Architecture
Recovery metrics (public networks)
| Scenario | RTO | RPO | Notes |
|---|---|---|---|
| Single pod failure | <1 minute | 0 | Kubernetes reschedules automatically |
| Database failover | 1–5 minutes | Seconds | CloudNativePG automatic failover |
| Cluster failover (GSLB) | 1–10 minutes | 1–5 minutes | Traffic shifts to healthy cluster |
| Full re-index required | 5–60 minutes | N/A | Depends on chain size |
Setup and maintenance (public networks)
| Task | Time estimate | Client role |
|---|---|---|
| Two cluster provisioning | 1 day | Client platform engineer |
| CloudNativePG setup (two clusters) | 1 day | Client platform engineer |
| DIDX setup | 1–2 days | Client platform engineer |
| Global traffic management | 4–8 hours | Client platform engineer |
| Total initial setup | 1.5–2 weeks | 1–2 client engineers |
| Activity | Frequency | Time per cycle |
|---|---|---|
| Replication lag monitoring | Daily | 15 minutes |
| Indexer sync verification | Daily | 15 minutes |
| DR drill / failover test | Quarterly | 4–8 hours |
| Security patching (2 clusters) | Monthly | 4–8 hours |
| Monthly effort | 20–30 hours |
Team requirements: 0.5–1 FTE dedicated platform engineer, lower effort than consortium due to simpler topology.
Hot-Cold (Backup-Based Recovery)
Backup-based disaster recovery with cold standby cluster. Lowest cost option with significant RPO (4–24 hours) and RTO (8–72 hours). Use only when cost constraints outweigh availability requirements.
Backup & Recovery
Backup strategy, tiered schedules, PostgreSQL PITR, DR testing requirements, and cloud provider-specific HA configurations for self-hosted DALP deployments.