SettleMint
Developer guidesOperations

Restate workflow recovery

Recover Restate-backed DALP workflows with operator-only DAPI routes for health checks, redeployment, stale deployment cleanup, and stuck workflow recovery.

Run the doctor route first. It reports Restate ingress, admin API, deployment, service, and invocation health without changing workflow state.

Use the write routes only after you know which condition you are fixing. Each route requires an authenticated operator with the system operate permission. DALP records operator route attempts for audit, including the actor, route, request arguments, reason when supplied, outcome, and error message when a route fails.

Before you start

Confirm you have:

  • access to the DALP admin API for the environment you are operating
  • an operator account or API key with system operate permission
  • the Restate service URL used by the durable workflow service, when you need to redeploy or clean deployments
  • the Restate workflow serviceName and serviceKey, when you need to clear a blocked workflow

Set a base URL for the examples:

export DALP_API_URL="https://your-platform.example.com"
export DALP_API_KEY="sm_dalp_xxxxxxxxxxxxxxxx"

1. Check workflow health

Run the doctor route first:

curl -X POST "$DALP_API_URL/api/v2/admin/operator/restate/doctor" \
  -H "X-Api-Key: $DALP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'

The response has five independent components:

ComponentWhat it tells you
ingressWhether the Restate ingress URL responds and how long it took.
adminWhether the Restate admin API responds and which version it reports.
deploymentsRegistered Restate deployments and their service URLs.
servicesRegistered Restate service names and revisions.
invocationsInvocation counts by status and up to 20 recent failures.

Each component has its own status: ok, unreachable, or degraded. A failed sub-check degrades only that component, so read the whole response before choosing a recovery action.

Example response shape:

{
  "ingress": { "status": "ok", "latencyMs": 12 },
  "admin": { "status": "ok", "latencyMs": 9, "version": "1.4.0" },
  "deployments": {
    "status": "ok",
    "items": [
      {
        "id": "dp_01j...",
        "serviceUrl": "http://ddwf:9080",
        "createdAt": "2026-05-09T10:00:00.000Z"
      }
    ],
    "error": null
  },
  "services": { "status": "ok", "items": [{ "name": "IdentityRecoveryWorkflow", "revision": 3 }], "error": null },
  "invocations": {
    "status": "ok",
    "byStatus": { "invoked": 2, "suspended": 1 },
    "recentFailures": [],
    "error": null
  }
}

2. Re-register the durable workflow service

Use force-redeploy when the service needs to be registered again with Restate.

curl -X POST "$DALP_API_URL/api/v2/admin/operator/restate/force-redeploy" \
  -H "X-Api-Key: $DALP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "serviceUrl": "http://ddwf:9080",
    "force": true
  }'

A successful response acknowledges the registration and returns the Restate deployment id:

{
  "acknowledged": true,
  "deploymentId": "dp_01j..."
}

force defaults to true. This route does not remove old deployments. If doctor still shows old deployments after redeploying, run stale deployment cleanup next.

3. Remove stale deployments

Use cleanup-stale-deployments after you know which service URL should remain active. DALP keeps the deployment matching serviceUrl and drains every other registered deployment.

curl -X POST "$DALP_API_URL/api/v2/admin/operator/restate/cleanup-stale-deployments" \
  -H "X-Api-Key: $DALP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "serviceUrl": "http://ddwf:9080",
    "forceDrain": false
  }'

A successful response is:

{ "acknowledged": true }

Use forceDrain: true only when stale deployments point to dead services and still have pending invocations that cannot drain normally. Confirm the result with doctor or the Restate admin deployment list.

4. Clear a blocked workflow for retry

Use recover-stuck-workflow when a specific workflow key is wedged and must start again from a blank state.

curl -X POST "$DALP_API_URL/api/v2/admin/operator/restate/recover-stuck-workflow" \
  -H "X-Api-Key: $DALP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "serviceName": "InvitationWorkflow",
    "serviceKey": "invitation_01j8m7k2q3r4s5t6u7v8w9x0y1_1770000000000"
  }'

serviceName and serviceKey may contain letters, numbers, underscores, and hyphens. A successful response is:

{ "acknowledged": true }

This route kills and purges prior invocations for the supplied workflow key, then clears keyed workflow state. The next workflow submission starts from a blank state.

DALP refuses to clear a workflow when recovery would hide a still-active invocation or a workflow that already succeeded. In that case, the route returns RESTATE_WORKFLOW_RETRY_BLOCKED with a structured reason and the relevant invocation ids.

Recovery error codes

Error codeMeaningWhat to do
OFFCHAIN_USER_PERMISSION_REQUIREDThe caller does not have system operate permission.Use an operator account or update the caller's role before retrying.
RESTATE_ADMIN_UNREACHABLEDALP could not resolve or reach the Restate admin API.Check Restate admin connectivity, then rerun doctor.
RESTATE_DEPLOYMENT_NOT_FOUNDCleanup or redeploy could not find the expected deployment for the supplied service URL.Run doctor and use the exact registered service URL.
RESTATE_WORKFLOW_RETRY_BLOCKEDWorkflow recovery found an active invocation, an already succeeded invocation, or a purge/query condition that prevents safe retry.Inspect the returned reason and invocationIds before deciding whether to retry later or investigate manually.

On this page