# Platform health in one view

Source: https://docs.settlemint.com/docs/changelog/dalp-3-0/monitoring
API traffic, blockchain health, and a single Platform Status rollup, right inside the product, in terms an operator can act on.



**We brought operational visibility inside the product. Live API traffic, chain health, and a single Platform Status rollup surface the problem before you have to go looking.**

<Video src="/media/dalp-3-0/monitoring.mp4" title="Platform health in one view, DALP 3.0" description="API traffic, chain status, and a single Platform Status view, right inside the product." />

<Spotlight
  eyebrow="Monitoring"
  title="Operator-grade visibility, inside the platform"
  aside="<FactList items={[
  { label: &#x22;Layers&#x22;, value: &#x22;API, chain, status&#x22; },
  { label: &#x22;View&#x22;, value: &#x22;One rollup&#x22; },
  { label: &#x22;Alerts&#x22;, value: &#x22;Threshold-based&#x22; },
  { label: &#x22;Context switch&#x22;, value: &#x22;None&#x22; },
]} />"
>
  API health, blockchain health, and a rollup verdict now live inside the product, expressed in terms operators act on. Leave the infrastructure dashboards for when you need them. The signal that matters is already here.
</Spotlight>

Knowing whether the platform was healthy used to mean leaving it. You would open a separate infrastructure dashboard, find the right panel, translate what it said into something you could act on, and then return to the product. For a team running tokenized assets under operational SLAs, that round-trip is slow and easy to miss.

The problem is not a lack of data. Any modern deployment generates plenty of metrics. The problem is that the data lives in the wrong place, expressed in the wrong terms, for the person who needs to make a call. An on-call operator managing a redemption window does not need a Prometheus graph. They need one answer: can operations proceed right now?

In DALP 3.0, Platform Status gives them that answer directly, as a built-in view rather than a link to something outside the product.

## Three layers, one view [#three-layers-one-view]

Platform monitoring is structured as three layers that compose into a single answer.

<Pipeline
  caption="Three signal layers roll up into one Platform Status: operational, degraded, or outage."
  stages="[
  { title: &#x22;API Monitoring&#x22;, detail: &#x22;Request volume, endpoint health, error rates, and trends across every Platform API call your operators and integrations make.&#x22; },
  { title: &#x22;Blockchain Monitoring&#x22;, detail: &#x22;RPC health, indexer state, sync lag, service health, and empty-block periods across every chain you run on.&#x22; },
  { title: &#x22;Platform Status&#x22;, detail: &#x22;A single operational / degraded / outage rollup that aggregates both layers, with per-panel detail for the signals driving each state.&#x22; },
]"
/>

Each layer stands on its own. Together they answer the question an on-call operator actually faces: not "is the infrastructure up?" but "can my asset operations proceed right now?" without leaving the product to stitch together signals from three separate tools.

The three-layer structure matters because each layer covers a different scope at a different timescale. API Monitoring reads from the top: are requests succeeding? Blockchain Monitoring reads from the bottom: is the underlying network in a state where new transactions can make progress? Platform Status combines both. Given everything each layer reports right now, is this system safe to operate on?

An operator with access to only the API layer might see clean success rates while the chain they settle on has been stalled for eight minutes. The composite view closes that gap.

## API monitoring [#api-monitoring]

The Platform API is the boundary between your operations and everything the platform does on their behalf. Every transfer instruction, every compliance check, every workflow trigger passes through it. When something goes wrong in your integration or in the platform itself, the API layer shows it first.

API monitoring gives you a live read on how that boundary is behaving under real traffic. Request volume, endpoint-level health, error rates, and trend lines are all visible without leaving the product. If a spike of 4xx responses appears, or a workflow path starts timing out, you see it in the same place you manage the assets it affects.

The view covers your full API surface: requests the operations team makes through the console, traffic integrations send programmatically, and calls the platform makes on its own behalf. All grouped by endpoint so the pattern is immediate.

![API monitoring overview: request volume, endpoint health, and error rates](/docs/screenshots/monitoring/api-monitoring-overview.webp)

### How verdicts avoid false signals [#how-verdicts-avoid-false-signals]

The verdict logic behind each endpoint panel is calibrated to avoid false signals on both busy and quiet days.

At high traffic volume, an outage verdict requires the 5xx rate to exceed 5% of total requests. That is enough to identify a systemic problem without triggering on isolated client errors.

At low volume, percentage thresholds break down. A single failed request on a quiet afternoon represents 100% failure by math, but says nothing meaningful about system health. Below 500 daily requests, the panel switches to an absolute count. Fifty or more 5xx responses flag an outage regardless of the day's total volume. If the day produced no traffic at all, the panel returns a no-data state rather than inferring health from silence.

Thresholds are consistent between the day-level summary and the live snapshot, so the verdict at 9am matches the one the same panel would have shown last night.

This matters most when a problem is localized rather than global. A misconfigured integration producing 4xx errors in volume shows up in the per-endpoint breakdown before it becomes a user-facing incident. A sustained spike of 5xx responses on a single route points to a specific surface area rather than forcing a full triage pass. The trend line makes clear whether you are watching a new problem develop or the recovery tail of something that already peaked.

## Blockchain monitoring [#blockchain-monitoring]

Running tokenized assets means taking operational responsibility for the chains they live on. A block stall, a degraded RPC node, or an indexer that falls behind the chain head can each silently affect whether transfers settle, compliance reads are current, or a scheduled redemption can proceed. The failure mode is subtle: the API layer may return clean responses while the chain underneath is in a state where no new transactions can land.

Blockchain monitoring answers the question most infrastructure dashboards handle poorly for an asset operator: is the chain actually usable right now?

<BeforeAfter
  before="{
  title: &#x22;Finding out from infrastructure&#x22;,
  points: [
    &#x22;Navigate to a separate infrastructure dashboard outside the product.&#x22;,
    &#x22;Find the right panel for the chain you are concerned about.&#x22;,
    &#x22;Translate raw RPC metrics into whether your asset operations are affected.&#x22;,
    &#x22;Repeat for every chain you run on.&#x22;,
  ],
}"
  after="{
  title: &#x22;With Blockchain Monitoring&#x22;,
  points: [
    &#x22;RPC health, indexer state, sync lag, and service health in one panel per chain.&#x22;,
    &#x22;Empty-block periods flagged with documented thresholds, in operator terms.&#x22;,
    &#x22;All chains you run on, side by side, without leaving the product.&#x22;,
    &#x22;Deployment guidance links directly to the underlying dashboards when deeper investigation is needed.&#x22;,
  ],
}"
/>

### Empty-block detection: idle vs. stalled [#empty-block-detection-idle-vs-stalled]

The most useful signal is often the subtlest. An empty-block period, a run of blocks containing no transactions, can mean the network is quiet or it can mean the network has stalled. Those two states look identical from outside. The difference determines whether you are waiting or broken.

<Callout title="'Nothing is happening' vs. 'nothing can happen'">
  Documented empty-block thresholds tell apart a normal quiet period from a stalled chain. Below the threshold, the chain is simply idle. Above it, the status shifts to reflect that transactions cannot progress, so you know whether to wait or intervene.
</Callout>

The empty-block distinction is calibrated to each chain's normal cadence. A private or consortium chain may produce blocks on a steady clock even when no user transactions are present, so a short run of empty blocks is unremarkable. A prolonged run, long enough to exceed the documented threshold for that chain, indicates that the block-production mechanism has likely stalled: the validator set has degraded, connectivity to the RPC node is interrupted, or the chain itself has halted. Once that threshold is crossed, the panel status shifts from idle to blocked, and you know to intervene rather than wait. Below the threshold, the panel stays quiet so a normally low-traffic chain generates no noise.

### Indexer sync lag [#indexer-sync-lag]

Blockchain monitoring also surfaces indexer state and sync lag. The Ledger Index needs to stay close to the chain head for historical queries and compliance reads to be current. Sync lag gives you a live measure of how close it is.

Sync lag is measured two complementary ways. The time delta shows how many seconds behind the latest indexed snapshot is relative to the current wall clock. The depth count shows how many unprocessed entries the indexer has yet to clear before it reaches the head. Both are visible in the per-chain panel.

They diverge meaningfully in practice. A chain with slow block times can show large block lag but small time lag. A chain with fast blocks can produce the opposite. This matters for compliance: a check run against an indexer 40 blocks behind may use data that does not reflect recent freezes or allowlist changes. The panel makes this visible so operators can decide whether to wait for the indexer to catch up or investigate what is causing the lag.

## Platform Status [#platform-status]

When you are responsible for a live book of tokenized assets, uncertainty is not just an inconvenience. Every unanswered question is a decision made without enough information. Do you hold a redemption window open or close it? Do you page the on-call team or wait another minute?

Platform Status is designed to remove that uncertainty. It reads signals from both API monitoring and blockchain monitoring and surfaces one of three states: **operational**, **degraded**, or **outage**. Operational means all monitored services are within normal bounds. Degraded means something is outside bounds but operations can continue. Outage means something is preventing transactions from progressing, a state that warrants intervention, not "check back later."

### Four panels, one rollup [#four-panels-one-rollup]

The rollup is panel-driven. Four independent views each cover a distinct signal source.

Data freshness tracks indexer sync state and data currency across chains: which ones are in sync, and how many sync errors occurred in the last 24 hours. Transactions shows in-flight and recent activity across the chains you operate on. Platform API reports request volume, 4xx rate, 5xx rate, and endpoint health over the last 24 hours. Workflows covers engine health and queue depth, including stalled workflow count.

Each panel computes its own verdict against its own thresholds. The overall status is the worst state any panel is in. If three panels are operational and one reports an outage, the rollup reflects outage. You always know the system-level state in a single view, and you always know which signal is driving it.

Each state links to the per-panel detail that drove it. You see at a glance that the status is degraded. One click shows which signal crossed its threshold. Panels load independently, so a slow query against one data source does not hold up the others. If a panel fails to load, it falls back to a no-data state rather than blocking the view.

### Querying platform status from an integration [#querying-platform-status-from-an-integration]

The per-panel design also matters for how you consume Platform Status from an integration. Each panel has its own endpoint. An integration that monitors for degraded or outage states can query only the panel it cares about. The `/platform-api` endpoint returns the verdict for API traffic. The `/data-freshness` endpoint returns indexer sync state. An integration that only cares about whether the chain an asset lives on is keeping up with the head does not need to wait on a workflow-engine query unrelated to its concern.

When you need to go deeper, deployment guidance points to the appropriate infrastructure dashboards. The path is documented: which dashboard, which panels, and what to look for.

[Monitor platform status →](/docs/operators/runbooks/monitor-platform-status)

## Compatibility / migration [#compatibility--migration]

The snapshot route that previously served all Platform Status data in a single response is deprecated in DALP 3.0. Each panel now loads through its own endpoint under `/api/v2/platform-status`: `/data-freshness` (indexer sync state across chains), `/transactions` (transaction infrastructure health), `/platform-api` (request volume with 4xx and 5xx rates), `/workflows` (workflow engine health and queue depth), and `/stat-cards` (summary operational metrics). The [platform-status endpoints reference](/docs/api-reference/observability/platform-status) documents each one.

The previous snapshot endpoint (`/api/v2/platform-status/snapshot`) continues to work for one release. To migrate, replace calls to the snapshot endpoint with the per-panel endpoint that returns the data your integration consumes. Per-panel endpoints are faster: they load in parallel, so a slow panel no longer delays the others.
