SettleMint
ArchitectureOperability

Observability

The observability stack provides comprehensive visibility into platform operations through metrics collection, log aggregation, distributed tracing, and pre-built dashboards for proactive monitoring and rapid incident response.

Overview

The observability stack provides complete visibility into DALP platform operations. Metrics, logs, and traces from all components collect centrally for unified monitoring. Pre-built dashboards surface operational health while alerting rules detect anomalies before they impact users.

Enterprise platforms require comprehensive observability. Operators need visibility into system health, security teams need audit trails, and developers need debugging capabilities. The observability stack addresses all these requirements through a unified telemetry infrastructure.

Three pillars

Rendering diagram...

API monitoring overview

Metrics

Time-series metrics capture quantitative measurements over time. Counters, gauges, and histograms represent request counts, resource utilization, and latency distributions.

Metric categoryExamplesUse case
Request metricsRate, latency, errorsPerformance monitoring
Resource metricsCPU, memory, connectionsCapacity planning
Business metricsTransactions, assets, usersOperational reporting
Chain metricsBlock lag, gas prices, confirmationsBlockchain health

Logs

Structured logs capture discrete events with rich context. JSON formatting enables efficient parsing and querying. Correlation identifiers link related log entries across components.

Traces

Distributed traces follow operations across component boundaries. Spans capture timing and metadata for each step. Trace visualization reveals bottlenecks and failure points in complex operations.

Pre-built dashboards

DashboardAudienceKey metrics
Operations overviewPlatform operatorsRequest rates, error rates, latency
Transaction monitorOperations teamPending transactions, gas usage, confirmations
Compliance activityCompliance officersVerification volumes, approval rates
Security overviewSecurity teamAuthentication events, access patterns
Infrastructure healthDevOpsResource utilization, node health

Detailed API request logs

Alerting

Alert rules trigger notifications when metrics exceed thresholds or exhibit anomalous patterns.

Alert categoryConditionSeverity
Error rate spikeError rate > 5% for 5 minutesCritical
Latency degradationP99 latency > 2x baselineWarning
Resource exhaustionMemory > 90% for 10 minutesWarning
Chain connectivityNo blocks for 5 minutesCritical
Transaction failureFailure rate > 1%Warning

Alert routing delivers notifications through appropriate channels: PagerDuty for critical alerts, Slack for warnings, email for informational.

Application logging configuration

Application logging can be configured through the config.yml file.

SettingEnvironment variableDefaultDescription
app.logLevelLOG_LEVEL or SETTLEMINT_LOG_LEVELinfoMinimum log level: debug, info, warn, warning, error, fatal
app.logOrpcRequestsLOG_ORPC_REQUESTSfalseEnable verbose ORPC request/response logging

Note: LOG_LEVEL takes precedence during auto-configuration. Invalid values are silently ignored and fall back to environment defaults (debug for development, info for production, warning for test).

ORPC request logging

When app.logOrpcRequests is enabled, the platform logs detailed information for each API request:

  • Request ID and URL
  • HTTP method and elapsed time
  • Response status codes
  • Procedure execution paths

This setting is disabled by default to keep logs clean in development and production. Enable it for debugging API issues:

# config.yml
app:
  logOrpcRequests: true

Or via environment variable:

LOG_ORPC_REQUESTS=true

On-chain transaction monitoring

Audit logging

Compliance requires comprehensive audit trails. The observability stack captures:

  • All authentication events with outcome and context
  • Authorization decisions with resource and action
  • Data access with query details and results
  • Configuration changes with before/after state
  • Administrative actions with operator identity

Audit logs retain according to regulatory requirements, typically seven years for financial services. Tamper-evident storage ensures log integrity.

Incident response

Observability tooling supports rapid incident response:

Correlation: Trace IDs link logs, metrics, and traces for affected operations.

Timeline reconstruction: Log search with time filters reveals event sequences.

Impact assessment: Metrics dashboards quantify affected users and operations.

Root cause analysis: Trace visualization identifies failing components.

Integration options

ComponentCloud optionsSelf-hosted options
MetricsDatadog, New RelicPrometheus, VictoriaMetrics
LogsDatadog, SplunkLoki, Elasticsearch
TracesDatadog, Jaeger CloudJaeger, Tempo
VisualizationDatadog, New RelicGrafana

Helm charts include Grafana dashboard configurations for common self-hosted deployments.

See also

On this page