Transaction lifecycle engine with eleven named states, DALP 3.0

Every transaction carries explicit state across preparation, approval, broadcast, and confirmation, so a stuck step recovers instead of going dark, and a retry can never duplicate an in-flight send.

Under real concurrency, with custody approval windows measured in minutes and variable chain conditions, a transaction that loses track of its progress will either send twice or disappear entirely. Eleven named states and a persisted checkpoint at every transition close both failure modes.

Transaction Lifecycle Engine

A transaction as a tracked, recoverable process

From receipt through confirmation, every step in a transaction's life is a named, persisted state. When something stalls, the engine resumes from exactly that state instead of retrying from the start or dropping the work entirely.

Nonce: Held for the lifetime
Reverts: Typed
Long writes: Return the result
Resume: From the last step

A blockchain send has always looked simple from the outside: sign it, broadcast it, wait for the receipt. What that hides is a sequence of steps, each of which can fail independently. The gas estimate can time out. The custody provider can hold the transaction pending approval. The broadcast can succeed while the receipt never arrives. If any step goes wrong and you have no record of which step it was, you face the same choice every time: retry and risk sending twice, or do nothing and leave a transfer stuck in an unknown state.

On a regulated platform that asymmetry has real consequences. A duplicate token issuance is a compliance event. An approval that silently expires means an investor transfer that never settled. A failed-but-unconfirmed send leaves a gap in your audit trail. These are not rare edge cases. Under real concurrency, with custody approval windows in the tens of minutes and variable chain conditions, they become near-certain.

DALP 3.0 addresses this by giving every transaction an explicit state machine. The Transaction Lifecycle Engine tracks each send through eleven named states, persists every transition, and holds the resources needed to resume safely when something interrupts.

Eleven states, no silent drops

The lifecycle runs from RECEIVED through QUEUED, PREPARING, SIGNING, BROADCASTING, and CONFIRMING to COMPLETED. Custody workflows that require external sign-off pass through PENDING_APPROVAL between preparation and signing. Every path ends in a terminal state: COMPLETED for successful sends, FAILED when retries are exhausted, CANCELLED when an operator or the system aborts, and DEAD_LETTER when the retry budget is gone and an operator must intervene.

Three paths exist, depending on how signing is handled.

1
Standard
QUEUED → PREPARING → SIGNING → BROADCASTING → CONFIRMING → COMPLETED. The platform requests a signature from the signer provider before broadcast.
2
Native broadcast
QUEUED → PREPARING → BROADCASTING → CONFIRMING → COMPLETED. The signing provider handles signing internally; the SIGNING step is skipped.
3
With approval
QUEUED → PREPARING → BROADCASTING → PENDING_APPROVAL → CONFIRMING → COMPLETED. A custody policy gates the transaction after broadcast; it waits in PENDING_APPROVAL until approved or expired.

Three processing paths through the transaction lifecycle, depending on custody configuration.

At every transition, the engine persists the new state before proceeding. Nothing can move from BROADCASTING to CONFIRMING without the state record reflecting that. If the process crashes between those two steps, it restarts at BROADCASTING, not from the beginning. The state is the checkpoint. Recovery starts where work stopped, with no gap between the two.

DEAD_LETTER is the safety valve. When a transaction has exhausted its automatic retry budget and cannot progress without human judgment, the engine parks it in DEAD_LETTER and surfaces it through the Platform API and CLI. An operator investigates, resolves the underlying issue, and rescues the transaction back to QUEUED to try again. The escalation path is controlled, not silent. Autonomous recovery and operator-assisted recovery are two distinct states, never conflated.

The nonce problem, solved once

Every Ethereum account sends transactions in strict sequential order. Each send carries a nonce, a counter that increments by one. Claim the same nonce twice and the network accepts the first, drops the second. Send with a nonce too low and the transaction is rejected outright.

This is where most fire-and-forget implementations break. Before DALP 3.0, a broadcast timeout followed by a retry could request a new nonce, either colliding with the in-flight send or creating a gap in the sequence that blocked every subsequent transaction from that account. On real concurrency, with custody approval windows and variable confirmation times, nonce collisions were near-certain. You had no way to know which failure mode you were in.

The Transaction Lifecycle Engine holds the nonce for the lifetime of the send. The nonce is allocated during PREPARING and held through BROADCASTING and CONFIRMING. Release happens only when the send reaches a terminal state: confirmed on-chain, explicitly failed, or cancelled. A retry against a live send reuses the same nonce rather than requesting a new one. Two in-flight sends from the same signing account are queued so each nonce is allocated in order.

The sub-status layer records exactly what went wrong when a nonce error does occur. NONCE_CONFLICT and NONCE_TOO_LOW are distinct sub-statuses on a FAILED state, not generic errors. An operator can tell from the record whether the failure was a sequencing error the system could have prevented or an external condition the system caught and reported correctly.

Reverts return typed reasons, not hex

When a transfer reverts on-chain, the failure has a reason. That reason is encoded in the transaction receipt as a selector and parameters. Before DALP 3.0, reading that reason meant taking the four-byte selector, looking it up against the contract ABI, extracting the parameters, and translating the result into something an operator or integration could act on. That decoding step fell to the caller.

The Transaction Lifecycle Engine decodes this automatically. Every on-chain fault type the platform tracks is declared in the contract ABI: frozen addresses, expired identity claims, allowlist misses, supply cap violations, policy blocks.

When a send reverts, the engine matches the returned selector against that registry, extracts the named parameters, and returns structured metadata in the response. The integration receives a fault name and the specific value that caused the rejection. The operator sees the exact rule that fired, not an opaque byte string.

For a regulated institution this matters in two directions. An operator acts on a specific reason immediately, without opening a support ticket to decode the failure. Every typed rejection also becomes a structured audit record: when the transfer was attempted, exactly why it failed, and which rule triggered it. The reason lives in the data, readable directly, without interpreting surrounding log lines. Your audit trail has traceability built in.

Long writes return a settled result, not a timeout

Some writes take longer than an HTTP connection will stay open. Deploying a token contract, settling a multi-step transfer, or running a compliance onboarding workflow can each wait for on-chain confirmation across multiple blocks. Before DALP 3.0, that wait often exceeded the 100-second proxy ceiling, returning a gateway timeout with no indication of whether the operation had succeeded.

v2 write endpoints return immediately with a correlation handle. The connection does not stay open waiting for the chain. You listen on the status endpoint, which emits the settled outcome the moment confirmation lands: final state, on-chain address for deployments, transaction hash, and the block the send was included in. If you disconnect and reconnect, you get the current state for that handle immediately, with nothing to replay or reconcile.

Confirmation logic now lives in one place. An integration that previously needed a polling loop, a timeout handler, and a reconciliation pass to decide whether a timed-out token deployment had actually succeeded now reads a settled result or a typed failure. The guesswork is gone.

Workflows resume where they stopped

Multi-step operations that include on-chain sends are idempotent across restarts. A pod eviction, a rolling upgrade, a deliberate pause, or a crash mid-flight all produce the same outcome: the operation resumes from the last completed step. Steps that finished are not re-executed. Steps that did not complete are retried from scratch until they succeed or exhaust their retry budget.

The mechanism is a persistent journal, where each step is an entry. When a run resumes after an interruption, the engine replays the journal to reconstruct state up to the last committed step, then continues forward. An onboarding that deploys a token, registers an investor identity, and executes an initial transfer can be interrupted at any point. Completed steps are not re-executed. The worst case is one incomplete step retried with the same inputs, and no completed step is ever duplicated.

This matters most at upgrade boundaries. The platform enforces a disruption budget so in-progress operations drain before the running instance is replaced, and a rolling upgrade mid-onboarding does not leave an investor partially registered.

Stalls surface before anyone has to go looking

The engine watches every active invocation against two thresholds. The first is how long a run has been in a pending state. The second is how long since any state mutation was last observed. Both must cross their threshold before the system flags a stall: a long-running operation that is still making progress does not trigger, even if it has been running for hours.

When both thresholds fire, the alert surfaces in the console rather than sitting silently in a log file. The difference is that the system tells you about the stall before you have to discover it. The team knows a run has stalled, can see why, and can act without trawling through infrastructure logs.

Runs that need a genuine decision land at the same surface. A send that reverted because a compliance rule changed mid-flight, or an approval that was explicitly rejected by a second signer, lands in the operator queue alongside stalled runs. An operator resolves the underlying issue and resumes through the Platform API or CLI. The distinction between automatic recovery and operator-assisted recovery is explicit: the engine handles what it can handle autonomously, surfaces what it cannot, and never conflates the two by silently marking a parked run complete.

When a run needs a hand

Automatic restart covers the common case. When a run has exhausted its retry budget and parked itself, it needs an operator decision, not a database query. The Platform API and CLI expose every paused invocation: list what is stuck, preview a resume as a dry run, resume one by ID or bulk-resume a set. You need no infrastructure access to do any of this.

The same surface covers runs that ended in DEAD_LETTER. An operator who resolves the underlying issue rescues the transaction back to QUEUED through the same interface. The audit record of the rescue is part of the lifecycle history for that transaction: when it was escalated, when it was resolved, who acted on it.

Background schedulers for reconciliation, rate refresh, and on-chain confirmation monitoring recover automatically after a crash. They restart from their last committed state and continue without operator intervention. Upgrades carry a disruption budget so in-progress work drains before the new version takes over.

Track transactions in Console →

Transactions that recover themselves