DALP backup and recovery for self-hosted deployments

Backup scope, recovery dependencies, PostgreSQL point-in-time recovery, namespace snapshots, monitoring signals, and disaster recovery drills for self-hosted DALP deployments.

Self-hosted DALP recovery depends on five surfaces: the database, Kubernetes resources, object storage, observability data, and configuration history. Platform operators verify the full set before treating an environment as production ready.

This page is a recovery reference for self-hosted deployments, not an SLA.

System context

Rendering diagram...

The recovery boundary spans live DALP services, their state stores, the Kubernetes namespace, configuration history, observability data, and the external EVM networks used to reconcile restored state. Test recovery in an isolated environment before routing clients back to the restored stack.

Recovery boundary

DALP provides chart-level backup resources and deployment guidance, but the recovery promise belongs to the operated environment. Recovery time objective and recovery point objective values depend on the selected infrastructure, object storage, PostgreSQL setup, restore automation, and runbook staffing. Set those targets for the deployment, then prove them through restore tests.

Do not publish an RTO or RPO as an external commitment until the same target has passed a drill using the target backup location, database restore path, object storage configuration, and route-switch procedure.

What the DALP chart contributes

When backups are enabled, the DALP chart can create the Velero backup storage location and schedule for the release namespace. The default schedule is daily at 02:00, targets DALP-labelled resources, includes persistent volumes through filesystem backup, excludes Kubernetes event resources, and derives the Velero TTL from the configured retention period.

These chart resources give operators a repeatable backup mechanism. They do not prove disaster recovery on their own.

Production evidence still needs each restored surface to work: the database, Kubernetes resources, object storage, application health checks, and reconciliation against the relevant EVM networks.

What gets backed up

Component	Backup method	Frequency	Retention	Recovery purpose
PostgreSQL data	Managed PITR or CNPG WAL shipping to object storage	Continuous	30 days	Restore application state to a selected point in time.
Kubernetes resources	Velero backups, with snapshots when available	Hourly/Daily/Weekly	48h/7d/30d	Recreate namespace resources after cluster loss or drift.
Object storage	Bucket versioning	Automatic	90 days	Recover files, backups, and exported artifacts.
Observability data	Velero backups when self-hosted	Daily	3 days	Preserve enough telemetry for incident review.
Configuration	Helm values in Git	Each committed values update	Indefinite	Rebuild the same deployment shape after an outage.

Chart-backed backup resources

The DALP chart can create Velero backup resources when backup.enabled is set. The chart configures a BackupStorageLocation and, when scheduled backups are enabled, a Velero Schedule for the release namespace plus any configured additional namespaces.

Chart setting	Default	Recovery meaning
`backup.enabled`	`false`	Backup resources are opt-in and require a Velero-compatible environment.
`backup.storage.provider`	`s3`	Backup storage can use S3-compatible storage, AWS S3, Azure Blob, or GCS.
`backup.schedule.cron`	`0 2 * * *`	The DALP chart schedule runs daily at 02:00 when scheduled backups are enabled.
`backup.retention.days`	`30`	Velero backup TTL is derived from this value.
`backup.includeAllPVCs`	`true`	Velero uses filesystem backup for volumes in the included namespace set.
`backup.labelSelector.matchLabels.kots.io/app-slug`	`settlemint-dalp`	Backups select DALP-labelled resources instead of every resource in the cluster.
`backup.schedule.paused`	`false`	Operators can pause the schedule without deleting the backup definition.

The support chart also carries a Velero schedule for platform support backups. In that chart, the default schedule runs every 4 hours with a 7-day TTL and excludes Kubernetes event resources. Treat the application chart and support chart as separate backup surfaces when you test recovery.

PostgreSQL PITR

For CloudNativePG deployments:

WAL shipping to object storage is continuous.
Base backups run daily.
Point-in-time recovery can restore to a moment within the retention window.

Velero can use CSI snapshots when a compatible CSI driver and VolumeSnapshot CRDs are installed. Without that support, Velero performs file-level backups.

Recovery checks

Run restore tests against an isolated environment, not the production namespace. A usable drill confirms these facts:

PostgreSQL restores to the selected timestamp inside the PITR window.
Kubernetes resources restore with the expected secrets, config maps, services, ingress, and persistent volumes.
Object storage data required by the restored environment is present at the expected version.
DALP services start against the restored database and configuration.
Operators record the achieved RTO and RPO, then compare them with the deployment target.

Rendering diagram...

Treat the diagram as the minimum drill loop. A restore that stops at database recovery is incomplete until DALP services start, indexed state is reconciled with chain state, and the measured recovery time is recorded against the target.

If a restore test needs manual changes that are not in Git or the runbook, treat the drill as failed until the missing step is documented and repeated.

Restore evidence to keep

Evidence	Why it matters
Backup name and creation time	Shows which recovery point was used.
Restore target timestamp	Lets operators compare the intended RPO with the achieved restore point.
Database checkpoint or WAL end	Confirms the database restored to the expected point before services started.
Restored namespace inventory	Confirms workloads, services, secrets, ingress, and persistent volumes exist.
Object version check	Confirms uploaded files and exported artifacts match the restored environment.
Application health checks	Confirms DALP services can read the restored state and serve traffic.
Reconciliation result	Confirms indexed, off-chain, and on-chain state are consistent enough to run.

For Velero filesystem backups, include the pod resources in the restore. Restoring only persistent volume claims can recreate empty volumes because the node agent downloads filesystem data when the restored pods run the restore wait flow.