Durable Execution Engine operator API reference

Reference for DALP operator API routes that check Durable Execution Engine health, re-register workflow services, remove stale deployments, and prepare a stuck workflow for retry.

The Durable Execution Engine operator API is an operator-only route set for workflow health checks and narrow recovery actions. Start with the doctor route, choose one write route that matches the failing component, then run doctor again to confirm the result.

These routes require an authenticated caller with the system operate permission. They are intended for platform operators and trusted automation, not tenant-facing product flows.

Recovery order

Step	Route	Purpose
1	`POST /api/v2/admin/operator/restate/doctor`	Inspect ingress, admin API, deployments, services, and invocation state without changing workflow state.
2	One write route	Re-register the service, remove stale deployments, or prepare one workflow key for retry.
3	`POST /api/v2/admin/operator/restate/doctor`	Verify that the affected component moved back to `ok` or that the remaining failure is understood.

The exact route path contains the current public API service segment. Treat that segment as an endpoint path, not as product terminology.

Component statuses

Doctor responses use the same status vocabulary for each component.

Status	Meaning
`ok`	The component responded and its payload matched the expected shape.
`degraded`	The component responded, but it returned an error status or an unexpected payload.
`unreachable`	DALP could not reach the component within the route timeout or could not complete the probe.

A degraded sub-check does not fail the whole doctor route. Read every component before choosing a recovery action.

Doctor

doctor is the read-only entry point. It probes the Durable Execution Engine ingress URL, admin API, deployment list, service list, and invocation table.

POST /api/v2/admin/operator/restate/doctor

Request body

Send an empty JSON object.

{}

Success response

{
  "ingress": { "status": "ok", "latencyMs": 12 },
  "admin": { "status": "ok", "latencyMs": 9, "version": "1.4.0" },
  "deployments": {
    "status": "ok",
    "items": [
      {
        "id": "dp_01j8m7k2q3r4s5t6u7v8w9x0y1",
        "serviceUrl": "https://workflow-service.example.com",
        "createdAt": "2026-05-09T10:00:00.000Z"
      }
    ],
    "error": null
  },
  "services": {
    "status": "ok",
    "items": [{ "name": "IdentityRecoveryWorkflow", "revision": 3 }],
    "error": null
  },
  "invocations": {
    "status": "ok",
    "byStatus": { "invoked": 2, "suspended": 1 },
    "recentFailures": [],
    "error": null
  }
}

recentFailures returns up to 20 recent failed invocations. Each item includes id, serviceName, serviceKey, failedAt, and errorMessage.

Force redeploy

force-redeploy registers the durable workflow service URL with the Durable Execution Engine admin API. The route does not remove old deployments. Run stale deployment cleanup when doctor still shows old deployment records.

POST /api/v2/admin/operator/restate/force-redeploy

Request body

Field	Type	Required	Description
`serviceUrl`	URL string	Yes	Service URL to register. Use the URL returned by doctor or the configured durable workflow service endpoint.
`force`	boolean	No	Defaults to `true`. Passes a forced registration request to the admin API.

{
  "serviceUrl": "https://workflow-service.example.com",
  "force": true
}

Success response

{
  "acknowledged": true,
  "deploymentId": "dp_01j8m7k2q3r4s5t6u7v8w9x0y1"
}

Cleanup stale deployments

cleanup-stale-deployments keeps the deployment matching serviceUrl. DALP drains every other registered deployment. Use this route after doctor or force redeploy confirms the active service URL.

POST /api/v2/admin/operator/restate/cleanup-stale-deployments

Request body

Field	Type	Required	Description
`serviceUrl`	URL string	Yes	Service URL of the deployment to keep. Every other registered deployment is treated as stale.
`forceDrain`	boolean	No	Defaults to `false`. Use `true` only when stale deployments point to dead services and cannot drain normally.

{
  "serviceUrl": "https://workflow-service.example.com",
  "forceDrain": false
}

Success response

{ "acknowledged": true }

If DALP cannot list deployments or reach the admin API, the route returns an admin-unreachable error instead of acknowledging cleanup.

Recover stuck workflow

recover-stuck-workflow prepares one workflow key for retry. DALP kills and purges prior invocations for the supplied (serviceName, serviceKey) pair, then clears keyed workflow state so the next submission starts from a blank state.

POST /api/v2/admin/operator/restate/recover-stuck-workflow

Request body

Field	Type	Required	Description
`serviceName`	string	Yes	Workflow service name. It must contain only letters, numbers, underscores, and hyphens.
`serviceKey`	string	Yes	Workflow service key. It must contain only letters, numbers, underscores, and hyphens.

{
  "serviceName": "IdentityRecoveryWorkflow",
  "serviceKey": "invitation_01j8m7k2q3r4s5t6u7v8w9x0y1"
}

Success response

{ "acknowledged": true }

DALP refuses to clear a workflow when an active invocation is still running or when the previous invocation already succeeded. That failure path returns a structured retry-blocked error with reason and invocationIds so the operator can inspect the active or completed work before trying again.

Error conditions

Condition	Meaning	Operator response
Missing system operate permission	The caller is not authorised for operator routes.	Use an operator account or API key with the required permission.
Admin API unreachable	DALP could not resolve or reach the Durable Execution Engine admin API.	Check admin connectivity, then rerun doctor.
Deployment not found	The supplied `serviceUrl` does not match a registered deployment when the route needs that mapping.	Run doctor and retry with the exact registered service URL.
Workflow retry blocked	Recovery found an active invocation, an already succeeded invocation, or a query or purge condition that prevents safe retry.	Inspect the returned `reason` and `invocationIds` before retrying or escalating.

Durable Execution Engine operator API

On this page