# Durable Execution Engine operator API

Source: https://docs.settlemint.com/docs/developer-guides/api-integration/durable-execution-engine-operator-api
Reference for DALP operator API routes that check Durable Execution Engine health, re-register workflow services, remove stale deployments, and prepare a stuck workflow for retry.



The Durable Execution Engine operator API is an operator-only route set for workflow health checks and narrow recovery actions. Start with the doctor route, choose one write route that matches the failing component, then run doctor again to confirm the result.

These routes require an authenticated caller with the system operate permission. They are intended for platform operators and trusted automation, not tenant-facing product flows.

## Recovery order [#recovery-order]

| Step | Route                                        | Purpose                                                                                                  |
| ---- | -------------------------------------------- | -------------------------------------------------------------------------------------------------------- |
| 1    | `POST /api/v2/admin/operator/restate/doctor` | Inspect ingress, admin API, deployments, services, and invocation state without changing workflow state. |
| 2    | One write route                              | Re-register the service, remove stale deployments, or prepare one workflow key for retry.                |
| 3    | `POST /api/v2/admin/operator/restate/doctor` | Verify that the affected component moved back to `ok` or that the remaining failure is understood.       |

The exact route path contains the current public API service segment. Treat that segment as an endpoint path, not as product terminology.

## Component statuses [#component-statuses]

Doctor responses use the same status vocabulary for each component.

| Status        | Meaning                                                                                      |
| ------------- | -------------------------------------------------------------------------------------------- |
| `ok`          | The component responded and its payload matched the expected shape.                          |
| `degraded`    | The component responded, but it returned an error status or an unexpected payload.           |
| `unreachable` | DALP could not reach the component within the route timeout or could not complete the probe. |

A degraded sub-check does not fail the whole doctor route. Read every component before choosing a recovery action.

## Doctor [#doctor]

`doctor` is the read-only entry point. It probes the Durable Execution Engine ingress URL, admin API, deployment list, service list, and invocation table.

```http
POST /api/v2/admin/operator/restate/doctor
```

### Request body [#request-body]

Send an empty JSON object.

```json
{}
```

### Success response [#success-response]

```json
{
  "ingress": { "status": "ok", "latencyMs": 12 },
  "admin": { "status": "ok", "latencyMs": 9, "version": "1.4.0" },
  "deployments": {
    "status": "ok",
    "items": [
      {
        "id": "dp_01j8m7k2q3r4s5t6u7v8w9x0y1",
        "serviceUrl": "https://workflow-service.example.com",
        "createdAt": "2026-05-09T10:00:00.000Z"
      }
    ],
    "error": null
  },
  "services": {
    "status": "ok",
    "items": [{ "name": "IdentityRecoveryWorkflow", "revision": 3 }],
    "error": null
  },
  "invocations": {
    "status": "ok",
    "byStatus": { "invoked": 2, "suspended": 1 },
    "recentFailures": [],
    "error": null
  }
}
```

`recentFailures` returns up to 20 recent failed invocations. Each item includes `id`, `serviceName`, `serviceKey`, `failedAt`, and `errorMessage`.

## Force redeploy [#force-redeploy]

`force-redeploy` registers the durable workflow service URL with the Durable Execution Engine admin API. The route does not remove old deployments. Run stale deployment cleanup when doctor still shows old deployment records.

```http
POST /api/v2/admin/operator/restate/force-redeploy
```

### Request body [#request-body-1]

| Field        | Type       | Required | Description                                                                                                  |
| ------------ | ---------- | -------- | ------------------------------------------------------------------------------------------------------------ |
| `serviceUrl` | URL string | Yes      | Service URL to register. Use the URL returned by doctor or the configured durable workflow service endpoint. |
| `force`      | boolean    | No       | Defaults to `true`. Passes a forced registration request to the admin API.                                   |

```json
{
  "serviceUrl": "https://workflow-service.example.com",
  "force": true
}
```

### Success response [#success-response-1]

```json
{
  "acknowledged": true,
  "deploymentId": "dp_01j8m7k2q3r4s5t6u7v8w9x0y1"
}
```

## Cleanup stale deployments [#cleanup-stale-deployments]

`cleanup-stale-deployments` keeps the deployment matching `serviceUrl`. DALP drains every other registered deployment. Use this route after doctor or force redeploy confirms the active service URL.

```http
POST /api/v2/admin/operator/restate/cleanup-stale-deployments
```

### Request body [#request-body-2]

| Field        | Type       | Required | Description                                                                                                   |
| ------------ | ---------- | -------- | ------------------------------------------------------------------------------------------------------------- |
| `serviceUrl` | URL string | Yes      | Service URL of the deployment to keep. Every other registered deployment is treated as stale.                 |
| `forceDrain` | boolean    | No       | Defaults to `false`. Use `true` only when stale deployments point to dead services and cannot drain normally. |

```json
{
  "serviceUrl": "https://workflow-service.example.com",
  "forceDrain": false
}
```

### Success response [#success-response-2]

```json
{ "acknowledged": true }
```

If DALP cannot list deployments or reach the admin API, the route returns an admin-unreachable error instead of acknowledging cleanup.

## Recover stuck workflow [#recover-stuck-workflow]

`recover-stuck-workflow` prepares one workflow key for retry. DALP kills and purges prior invocations for the supplied `(serviceName, serviceKey)` pair, then clears keyed workflow state so the next submission starts from a blank state.

```http
POST /api/v2/admin/operator/restate/recover-stuck-workflow
```

### Request body [#request-body-3]

| Field         | Type   | Required | Description                                                                             |
| ------------- | ------ | -------- | --------------------------------------------------------------------------------------- |
| `serviceName` | string | Yes      | Workflow service name. It must contain only letters, numbers, underscores, and hyphens. |
| `serviceKey`  | string | Yes      | Workflow service key. It must contain only letters, numbers, underscores, and hyphens.  |

```json
{
  "serviceName": "IdentityRecoveryWorkflow",
  "serviceKey": "invitation_01j8m7k2q3r4s5t6u7v8w9x0y1"
}
```

### Success response [#success-response-3]

```json
{ "acknowledged": true }
```

DALP refuses to clear a workflow when an active invocation is still running or when the previous invocation already succeeded. That failure path returns a structured retry-blocked error with `reason` and `invocationIds` so the operator can inspect the active or completed work before trying again.

## Error conditions [#error-conditions]

| Condition                         | Meaning                                                                                                                       | Operator response                                                                |
| --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
| Missing system operate permission | The caller is not authorised for operator routes.                                                                             | Use an operator account or API key with the required permission.                 |
| Admin API unreachable             | DALP could not resolve or reach the Durable Execution Engine admin API.                                                       | Check admin connectivity, then rerun doctor.                                     |
| Deployment not found              | The supplied `serviceUrl` does not match a registered deployment when the route needs that mapping.                           | Run doctor and retry with the exact registered service URL.                      |
| Workflow retry blocked            | Recovery found an active invocation, an already succeeded invocation, or a query or purge condition that prevents safe retry. | Inspect the returned `reason` and `invocationIds` before retrying or escalating. |

## Related pages [#related-pages]

* [Durable Execution Engine recovery](/docs/developer-guides/operations/durable-execution-engine-recovery)
* [API monitoring](/docs/developer-guides/api-integration/api-monitoring)
* [API error reference](/docs/developer-guides/api-integration/dapi-error-reference)
* [Authorization](/docs/architecture/security/authorization)
