DALP failure modes: degradation and recovery behaviour

Architecture-level failure-mode reference for DALP deployments, covering how platform components degrade, what operators can detect, and which recovery path applies when dependencies are unavailable.

DALP failure handling separates three questions you need to answer during an incident: what is affected, whether the platform can continue safely, and who must restore the dependency. Security-sensitive work fails closed when required checks or signatures cannot complete. Durable workflows, idempotent processing, and RPC failover limit the blast radius where the platform has enough state to retry safely.

Use this page as an architecture reference, not as an incident runbook. The catalog helps you connect alerts, logs, workflow state, and high availability plans to the affected component.

Observability for metrics, logs, traces, dashboards, and alerts.
Database for PostgreSQL persistence, backups, and restore planning.
High availability for deployment patterns, RTO, RPO, and recovery drills.
Signing flow for transaction durability and retry behaviour.

Failure response model

DALP uses four recovery responses depending on the affected layer and whether continuing would be safe:

Response	When it applies	Operator expectation
Fail over	A configured equivalent dependency can handle the request, such as another RPC endpoint or healthy application instance	Confirm the healthy target is serving traffic and investigate the failed dependency
Retry	The platform has enough durable state to repeat a workflow step, signing request, transaction submission, or event handler safely	Monitor retry exhaustion, dependency recovery, and duplicate-suppression evidence
Fail closed	A required identity, compliance, authentication, signing, or data check cannot finish safely	Treat the blocked request as protective behaviour, not a successful business operation
Manual recovery	The dependency, policy approval, database, or operating environment must be restored outside the affected workflow	Follow the deployment runbook and verify state before resuming normal operations

Rendering diagram...

Failure mode catalog

Blockchain layer

Failure	User-facing impact	DALP behaviour	Recovery path
One RPC endpoint is unreachable	Chain reads or writes may slow down	The network transport can use configured fallback RPC URLs with retry and backoff	Automatic failover when another configured endpoint is healthy
All configured RPC endpoints are unavailable	New chain reads and transaction submission cannot complete	Work that depends on chain access waits or fails according to the calling workflow	Restore RPC connectivity, then verify queued or retried operations
Block reorganisation	Indexed data can temporarily reflect reverted transactions	Indexer tests cover reorg handling and idempotent replay for affected event state	Reprocess from the corrected chain state and verify indexed data
Gas price spike or transaction submission failure	Transaction confirmation may be delayed	Signing and submission flows estimate gas and retry failed submission steps where safe	Automatic retry where configured; operator review if retries exhaust
Nonce conflict	A transaction can be rejected by the network	The signing flow serialises and retries transaction work instead of treating the conflict as success	Retry from the signing workflow and verify the final on-chain transaction

Workflow and execution layer

Failure	User-facing impact	DALP behaviour	Recovery path
Durable workflow runtime restart	In-flight workflow steps pause	Persisted workflow state lets work resume after the runtime is available again	Restart the runtime and verify the workflow resumes or reaches a terminal state
Workflow step failure	One multi-step operation is delayed or blocked	The failed step retries according to the workflow policy and preserves previous completed steps	Automatic retry first; manual intervention if the step cannot complete
Workflow database connection loss	Workflow state cannot be checkpointed or read	Workflow operations that need state cannot safely advance	Restore database connectivity, then verify workflow state before retrying

Indexer layer

Failure	User-facing impact	DALP behaviour	Recovery path
Indexer stops during block processing	Read models and dashboards can lag behind chain state	The indexer resumes from persisted progress and avoids duplicate event effects during replay	Restart the indexer and compare indexed state with chain state
Event handler failure	One event family may be stale while others continue	Handler retries keep duplicate processing from becoming a second business event	Fix the handler or dependency, then replay and verify the affected records
RPC rate limit during indexing	Indexing slows down	Network configuration includes retry backoff and rate-limit settings for log fetching	Reduce concurrency, raise provider limits, or add capacity, then monitor catch-up

API and application layer

Failure	User-facing impact	DALP behaviour	Recovery path
API instance is unhealthy	Requests routed to that instance fail	Readiness and health endpoints let the platform route traffic only to healthy instances	Restart or replace the unhealthy instance and confirm readiness
Authentication or authorisation cannot complete	Users cannot start new protected operations	Access fails closed rather than granting unauthenticated or unauthorised access	Restore the identity dependency and confirm the user's effective permissions
Database is unreachable	API operations that need current data fail	Data-dependent operations return errors instead of inventing state	Restore database connectivity and check the affected operation again

Custody and signing layer

Failure	User-facing impact	DALP behaviour	Recovery path
Custody provider is unreachable	Transactions that require a signature cannot proceed	Signing work waits or retries; DALP does not skip the signature requirement	Restore the provider connection and verify the pending transaction state
Custody policy blocks a transaction	The transaction remains pending or rejected	DALP surfaces the policy state instead of bypassing the provider policy	Approve, reject, or adjust the policy in the custody system according to the operating procedure
Signing timeout	Transaction submission is delayed	The signing flow can retry the signing request where the workflow has preserved state	Confirm whether a signature was produced, then retry or reconcile the transaction

Degradation principles

DALP favours protective degradation over silent continuation:

Compliance and eligibility checks that cannot finish block the affected transfer or issuance request.
Authentication and authorisation failures deny access instead of granting temporary privileges.
Signing failures keep the transaction pending or failed; they do not create an unsigned shortcut.
Read models can be stale during indexing or RPC disruption, so operators should compare freshness signals before acting on dashboards.
External dependencies such as RPC providers, custody providers, identity providers, and database infrastructure must be restored by the deployment operator or provider owner.

How to use this page during a review

Identify the affected layer from observability evidence.
Check whether the expected response is failover, retry, fail-closed blocking, or manual recovery.
Follow the matching operational runbook for the deployment environment.
Verify recovery with current telemetry, workflow state, indexed data, and audit records.
Use the high availability pages to compare the measured recovery against the deployment's RTO and RPO targets.

Failure Modes