# Restate workflow recovery

Source: https://docs.settlemint.com/docs/developer-guides/operations/restate-workflow-recovery
Recover Restate-backed DALP workflows with operator-only DAPI routes for health checks, redeployment, stale deployment cleanup, and stuck workflow recovery.



Run the `doctor` route first. It reports Restate ingress, admin API, deployment, service, and invocation health without changing workflow state.

Use the write routes only after you know which condition you are fixing. Each route requires an authenticated operator with the system operate permission. DALP records operator route attempts for audit, including the actor, route, request arguments, reason when supplied, outcome, and error message when a route fails.

## Before you start [#before-you-start]

Confirm you have:

* access to the DALP admin API for the environment you are operating
* an operator account or API key with system operate permission
* the Restate service URL used by the durable workflow service, when you need to redeploy or clean deployments
* the Restate workflow `serviceName` and `serviceKey`, when you need to clear a blocked workflow

Set a base URL for the examples:

```bash
export DALP_API_URL="https://your-platform.example.com"
export DALP_API_KEY="sm_dalp_xxxxxxxxxxxxxxxx"
```

## 1. Check workflow health [#1-check-workflow-health]

Run the doctor route first:

```bash
curl -X POST "$DALP_API_URL/api/v2/admin/operator/restate/doctor" \
  -H "X-Api-Key: $DALP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'
```

The response has five independent components:

| Component     | What it tells you                                                    |
| ------------- | -------------------------------------------------------------------- |
| `ingress`     | Whether the Restate ingress URL responds and how long it took.       |
| `admin`       | Whether the Restate admin API responds and which version it reports. |
| `deployments` | Registered Restate deployments and their service URLs.               |
| `services`    | Registered Restate service names and revisions.                      |
| `invocations` | Invocation counts by status and up to 20 recent failures.            |

Each component has its own `status`: `ok`, `unreachable`, or `degraded`. A failed sub-check degrades only that component, so read the whole response before choosing a recovery action.

Example response shape:

```json
{
  "ingress": { "status": "ok", "latencyMs": 12 },
  "admin": { "status": "ok", "latencyMs": 9, "version": "1.4.0" },
  "deployments": {
    "status": "ok",
    "items": [
      {
        "id": "dp_01j...",
        "serviceUrl": "http://ddwf:9080",
        "createdAt": "2026-05-09T10:00:00.000Z"
      }
    ],
    "error": null
  },
  "services": { "status": "ok", "items": [{ "name": "IdentityRecoveryWorkflow", "revision": 3 }], "error": null },
  "invocations": {
    "status": "ok",
    "byStatus": { "invoked": 2, "suspended": 1 },
    "recentFailures": [],
    "error": null
  }
}
```

## 2. Re-register the durable workflow service [#2-re-register-the-durable-workflow-service]

Use `force-redeploy` when the service needs to be registered again with Restate.

```bash
curl -X POST "$DALP_API_URL/api/v2/admin/operator/restate/force-redeploy" \
  -H "X-Api-Key: $DALP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "serviceUrl": "http://ddwf:9080",
    "force": true
  }'
```

A successful response acknowledges the registration and returns the Restate deployment id:

```json
{
  "acknowledged": true,
  "deploymentId": "dp_01j..."
}
```

`force` defaults to `true`. This route does not remove old deployments. If doctor still shows old deployments after redeploying, run stale deployment cleanup next.

## 3. Remove stale deployments [#3-remove-stale-deployments]

Use `cleanup-stale-deployments` after you know which service URL should remain active. DALP keeps the deployment matching `serviceUrl` and drains every other registered deployment.

```bash
curl -X POST "$DALP_API_URL/api/v2/admin/operator/restate/cleanup-stale-deployments" \
  -H "X-Api-Key: $DALP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "serviceUrl": "http://ddwf:9080",
    "forceDrain": false
  }'
```

A successful response is:

```json
{ "acknowledged": true }
```

Use `forceDrain: true` only when stale deployments point to dead services and still have pending invocations that cannot drain normally. Confirm the result with doctor or the Restate admin deployment list.

## 4. Clear a blocked workflow for retry [#4-clear-a-blocked-workflow-for-retry]

Use `recover-stuck-workflow` when a specific workflow key is wedged and must start again from a blank state.

```bash
curl -X POST "$DALP_API_URL/api/v2/admin/operator/restate/recover-stuck-workflow" \
  -H "X-Api-Key: $DALP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "serviceName": "InvitationWorkflow",
    "serviceKey": "invitation_01j8m7k2q3r4s5t6u7v8w9x0y1_1770000000000"
  }'
```

`serviceName` and `serviceKey` may contain letters, numbers, underscores, and hyphens. A successful response is:

```json
{ "acknowledged": true }
```

This route kills and purges prior invocations for the supplied workflow key, then clears keyed workflow state. The next workflow submission starts from a blank state.

DALP refuses to clear a workflow when recovery would hide a still-active invocation or a workflow that already succeeded. In that case, the route returns `RESTATE_WORKFLOW_RETRY_BLOCKED` with a structured reason and the relevant invocation ids.

## Recovery error codes [#recovery-error-codes]

| Error code                          | Meaning                                                                                                                             | What to do                                                                                                        |
| ----------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| `OFFCHAIN_USER_PERMISSION_REQUIRED` | The caller does not have system operate permission.                                                                                 | Use an operator account or update the caller's role before retrying.                                              |
| `RESTATE_ADMIN_UNREACHABLE`         | DALP could not resolve or reach the Restate admin API.                                                                              | Check Restate admin connectivity, then rerun doctor.                                                              |
| `RESTATE_DEPLOYMENT_NOT_FOUND`      | Cleanup or redeploy could not find the expected deployment for the supplied service URL.                                            | Run doctor and use the exact registered service URL.                                                              |
| `RESTATE_WORKFLOW_RETRY_BLOCKED`    | Workflow recovery found an active invocation, an already succeeded invocation, or a purge/query condition that prevents safe retry. | Inspect the returned `reason` and `invocationIds` before deciding whether to retry later or investigate manually. |

## Related pages [#related-pages]

* [Blockchain monitoring](/docs/developer-guides/operations/blockchain-monitoring)
* [Transaction tracking](/docs/developer-guides/operations/transaction-tracking)
* [DAPI error reference](/docs/developer-guides/api-integration/dapi-error-reference)
* [Authorization](/docs/architecture/security/authorization)
