SettleMint
ArchitectureOperability

Failure Modes

Catalog of architecture-level failure modes across DALP components, documenting degradation behavior, detection mechanisms, and recovery strategies for each failure scenario.

Purpose

Catalogs architecture-level failure modes, their impact, and how DALP components degrade and recover.

  • Doc type: Reference
  • What you'll find here:
    • Failure modes per component layer
    • Degradation behavior (fail-open vs fail-closed)
    • Detection and recovery mechanisms
    • Impact on user-facing operations
  • Related:

Failure mode catalog

Blockchain layer

FailureImpactBehaviorRecovery
RPC node unreachableNo blockchain reads or writesChain Gateway fails over to next node in poolAutomatic (health check + failover)
All RPC nodes downComplete blockchain outageTransactions queue in Restate; reads return stale dataManual: restore node connectivity
Block reorgIndexed data may reflect reverted transactionsReorg detection via block hash comparison (infrastructure in place, not yet active)Future: automatic rollback and reprocess
Gas price spikeTransaction submission may fail or be slowAutomatic gas estimation; transactions retry with updated gasAutomatic retry via nonce manager
Nonce conflictTransaction rejected by networkNonce manager queues and reorders; Restate retriesAutomatic

Execution engine layer

FailureImpactBehaviorRecovery
Restate server crashIn-flight workflows pauseJournaled steps preserved; automatic resume on restartAutomatic (Restate journal replay)
Workflow step failureSingle step in multi-step workflow failsRestate retries with configurable backoffAutomatic retry; manual intervention if retries exhausted
Database connection lostCannot checkpoint or read stateRestate retries database operationsAutomatic retry; manual if persistent

Indexer layer

FailureImpactBehaviorRecovery
Indexer crash during block processingGap in indexed dataResume from last checkpoint; idempotent event processingAutomatic (checkpoint-based)
Event handler failureSingle event type not processedprocessedEvents table prevents duplicate processing on retryAutomatic retry
RPC rate limit exceededIndexer sync slowsBatch size and concurrency respect configured limitsAutomatic backoff

API layer

FailureImpactBehaviorRecovery
API server crashRequests failLoad balancer routes to healthy instancesAutomatic (horizontal scaling)
Authentication service downNo new sessionsExisting sessions continue (cached); new logins failRestart auth service
Database unreachableAPI returns errorsFail-closed: operations that require data return 503Restore database connection

Custody layer

FailureImpactBehaviorRecovery
Custody provider unreachableCannot sign transactionsTransactions queue in Restate; retry on availabilityAutomatic retry
Policy engine blocks transactionTransaction held for approvalDALP surfaces pending approval in operator interfaceManual approval or policy adjustment
MPC signing timeoutTransaction delayedRestate retries signing requestAutomatic retry

Degradation philosophy

DALP follows fail-closed for security-sensitive operations:

  • Compliance checks that cannot complete → transfer blocked (not allowed by default)
  • Authentication failures → access denied
  • Signing failures → transaction queued, not skipped

Read-only operations degrade gracefully:

  • Stale indexed data is served with freshness indicators
  • Cached API responses continue serving during database outages
  • Dashboard shows last-known state with staleness warnings

See also

On this page