DALP operability: telemetry, persistence, resilience

Architecture-level map for operating DALP deployments: telemetry, PostgreSQL persistence, workflow durability, high availability handoffs, and failure behavior.

Operational posture

DALP operability connects four production concerns: visibility, durable state, recovery planning, and failure handling. Operators use telemetry to detect incidents, PostgreSQL and workflow checkpoints to preserve state, high availability patterns to set recovery targets, and failure-mode guidance to decide whether the platform retries, fails over, blocks unsafe work, or needs manual intervention.

Start here when you need a production-readiness view rather than a deployment recipe. The operability section helps buyers assess resilience, operators plan support, and security reviewers check monitoring, access, and recovery responsibilities from the same evidence chain.

Reader	Start with	Direct answer
Buyers	High availability	Resilience depends on the selected deployment pattern, measured recovery targets, and restore evidence.
Operators	Observability	Enabled telemetry gives the team metrics, logs, traces, dashboards, and alerts for incident response.
Data owners	Database	PostgreSQL stores application, workflow, indexed chain, and audit data that must survive disruption.
Security reviewers	Failure modes	Security-sensitive operations fail closed when required checks cannot finish.

What this section covers

DALP keeps deployments observable, durable, and recoverable through deployment telemetry, PostgreSQL application stores, workflow checkpoints, transaction checkpoints, component failure behavior, and the selected high availability pattern.

Responsibility splits by operating concern:

Concern	DALP documents	Deployment operator owns
Telemetry	Signals emitted by platform components and the observability chart components that can collect and display them	Enabling the stack, configuring sinks, alert routing, retention, dashboard access, and incident response
State and recovery	Data domains, workflow checkpoints, idempotent retry behaviour, and recovery patterns	PostgreSQL topology, backups, restore drills, infrastructure failover, measured RTO/RPO, and access controls
Failure handling	Component degradation behaviour, retry paths, failover paths, and fail-closed controls	Runbooks, escalation paths, external dependency restoration, and manual approvals when a provider or policy gate requires them
Asset control policy	Where operability evidence supports audits and production readiness	Asset rules, custody policy approvals, compliance policy design, and business workflow changes

This section does not define asset issuance design, compliance rule authoring, custody policy governance, chain validator or RPC provider service levels, legal retention commitments, or privacy programme design.

It also does not define bridge behaviour or non-EVM network support.

Use the product, compliance, custody, and integration pages for those decisions.

Operating model

DALP's operating model has four linked concerns:

Visibility: operators inspect platform health through metrics, structured logs, traces, dashboards, and alerts when the observability stack is enabled.
Persistence: PostgreSQL stores application data such as identity, asset configuration, indexed chain state, workflow state, and audit records.
Recovery planning: self-hosted deployments choose a high availability pattern, assign owners, and prove RTO, RPO, and measured recovery time through restore drills.
Failure handling: workflow checkpoints, idempotent processing, retries, failover, and fail-closed controls limit the effect of failures to the affected component or workflow where the platform can do so safely.

Rendering diagram...

These concerns work together. Telemetry tells operators which component or chain is affected. PostgreSQL and the execution engine preserve state across restarts. High availability docs define the selected recovery pattern and evidence. Failure-mode documentation identifies whether the expected recovery path is automatic retry, failover, manual intervention, or restoring an external dependency.

Evidence chain

For production reviews, treat operability as an evidence chain instead of a set of isolated pages:

Observability shows what the deployment can detect.
PostgreSQL and workflow checkpoints show which state survives restart, failover, or restore.
High availability planning shows the recovery pattern and measured recovery targets.
Failure-mode guidance shows what the platform retries, what it blocks, and what the operator must restore.

That chain helps operators and reviewers separate platform behaviour from deployment responsibilities without turning the overview into an incident runbook.

Review path

Use the operability pages as a sequence when you need to prove production readiness.

Question	Evidence to inspect	Page
How will the team know something is unhealthy?	Metrics, logs, traces, dashboards, alert labels, and deployment telemetry configuration	Observability
Which state must survive restart, failover, or restore?	PostgreSQL data domains, database HA, backup layers, audit logging, and access controls	Database
Which recovery target drives the infrastructure design?	Cloud-native, hot-warm, hot-cold, or hot-hot pattern; RTO, RPO, measured recovery time, drills	High availability
What happens when a dependency or component is unavailable?	Component failure modes, degraded behavior, detection path, retry or manual recovery expectation	Failure modes
Which transaction or signing state can resume after disruption?	Workflow checkpointing, idempotent processing, signing durability, and transaction retry paths	Signing flow

Key reliability characteristics

Characteristic	Operator value	Related page
Workflow durability	Restarts can resume from the last recorded checkpoint	Execution Engine
Idempotent processing	Retries avoid duplicate transaction or event effects	Signing flow
Fail-closed controls	Unsafe continuation is blocked	Failure modes
Deployment visibility	Metrics, logs, traces, dashboards, and alerts exist	Observability
Recovery evidence	Restore drills compare measured recovery time to RTO	High availability

Where to go next

Plan monitoring and alert routing with observability.
Review data storage, replication, backups, and retention in database.
Map incident response paths with failure modes.
Choose deployment resilience patterns with high availability.
Review deployment prerequisites in self-hosting.

Overview