Skip to main content

Health Observability

New

Every service deployed on Sylphx gets a SOTA /healthz endpoint for free. Continuous score in [0, 1], multi-signal composition, OpenTelemetry-native, causality-aware, audit-grade — built into @sylphx/sdk/health with zero config.

Zero-Config /healthz

Mount one middleware — Hono, Express, Fastify, or Next.js. No probe paths to configure.

Seven Built-in Signals

Event-loop lag, queue depth, error rate, memory pressure, generic ping, database, cache.

Continuous Score

A number in [0, 1] composed across signals. Drain at 0.8, restart at 0.5 — no flapping.

OpenTelemetry-Native

`health.score` + `health.signal.*` metrics emit automatically. Drops into any OTel collector.

Causality Propagation

One database outage = one paged alert + N silent downstream rows. Not 12 red dots.

Tamper-Evident Audit

SHA-256 prev-hash chain on every evaluation. Export and verify offline for compliance.

Quickstart

Add the SDK and mount the framework-native middleware. Every request to /healthz now returns a multi-signal score; the runtime boundary uses the score to make drain and restart decisions automatically.

1. Install
bun add @sylphx/sdk
import { Hono } from 'hono'
import { withHealth } from '@sylphx/sdk/health'

const app = new Hono()

// One line — every service gets a multi-signal /healthz
app.use('*', withHealth.hono())

app.get('/', (c) => c.text('hello world'))

export default app
On detection of @sylphx/sdk/health at deploy time, Sylphx flips the runtime boundary's probe path to /healthz automatically. Existing services keep their previous probe path until they install the SDK — no forced migration.

Wire shape

Every /healthz response carries the same JSON shape. score in [0, 1] is the headline number; signalFactors is the per-signal decomposition used to derive it. Stable from day one — additive evolution only.

Response body
{
  "score": 0.92,
  "signals": {
    "eventLoopLagMs": 12,
    "database": "ok",
    "redis": "ok",
    "errorRate": 0.001,
    "memoryPressure": 0.45
  },
  "signalFactors": {
    "eventLoopLagMs": 1,
    "database": 1,
    "redis": 1,
    "errorRate": 0.99,
    "memoryPressure": 0.7
  },
  "lastTickAt": "2026-05-20T12:34:56.789Z"
}
PropertyTypeDescription
scorenumberAggregate health in [0, 1]. 1 = healthy; 0 = dead.
signalsRecord<string, value>Raw reading per signal. Numbers (latency, ratios) or short tokens ("ok", "timeout").
signalFactorsRecord<string, number>Each signal mapped to a health factor in [0, 1]; folded by the scorer.
lastTickAtstring (ISO-8601)Timestamp of the most recent evaluation.

Composing signals

The default withHealth.* middleware registers a single event-loop-lag signal. Apps that care about database / cache / queue health pass an explicit signals array. Each signal returns a numeric reading + a health factor; the scorer folds them into the headline score.

import { sylphxHealth } from '@sylphx/sdk/health'
import {
  eventLoopLagSignal,
  databaseSignal,
  redisSignal,
  errorRateSignal,
  memoryPressureSignal,
} from '@sylphx/sdk/health'

const errors = errorRateSignal({ window: '5s', degradedRate: 0.05 })

const health = sylphxHealth({
  signals: [
    eventLoopLagSignal({ degradedMs: 5000, deadMs: 30000 }),
    databaseSignal({ ping: () => pool.query('SELECT 1') }),
    redisSignal({ ping: () => redis.ping() }),
    errors,
    memoryPressureSignal({ degradedRatio: 0.85 }),
  ],
})

// Track requests for the error-rate signal
app.use(async (c, next) => {
  try {
    await next()
    errors.recordSuccess()
  } catch (err) {
    errors.recordError()
    throw err
  }
})

Score interpretation

The runtime boundary maps the score to a three-tier gate. Apps don't make this decision — they just report the score. The boundary applies the thresholds below.

PropertyTypeDescription
score &gt; 0.8HealthyNormal traffic; service is reachable and responsive.
(0.5, 0.8]DrainNew traffic stops arriving; in-flight requests complete. No restart yet — service may recover.
score &le; 0.5RestartService is unhealthy; the runtime boundary triggers a controlled restart after the configured threshold.
The thresholds are conservative on purpose. signalFactors shows you exactly which signal pushed the score down — drill in via the Sylphx dashboard or any OTel sink.

Fleet migration codemod

Already running services with hand-rolled /health and /ready? The CLI codemod detects your framework, rewrites the entry file, and removes the legacy routes in one command.

Migrate one service
# Migrate an existing service in one command
bunx @sylphx/cli migrate health-checks

# Preview without writing
bunx @sylphx/cli migrate health-checks --dry-run

# CI guard — exit 1 if pending
bunx @sylphx/cli migrate health-checks --check

Once installed across multiple services, audit your fleet to see which still need attention.

Fleet status
# Audit your fleet — which services already migrated?
bunx @sylphx/cli migrate status

# Output (3-column dashboard):
#   Project    Service    Status
#   proj_abc   web        ✓ migrated
#   proj_abc   worker     ✗ pending
#   proj_xyz   api        ✓ migrated

Audit-grade evidence

Every health evaluation produces a SHA-256 prev-hash chained record. The chain is tamper-evident: rewriting any past evaluation invalidates every record after it. Export to JSON for compliance review and verify offline with the CLI.

Verify an exported chain
# Verify a tamper-evident history chain end-to-end
bunx @sylphx/cli health verify --file ./health-history.json

# Exit codes:
#   0 — chain verified, no tampering
#   1 — break detected (prints first break + sequence number)
#   2 — input file malformed
Pluggable HistoryStore interface lets compliance-grade deployments swap the default in-memory ring buffer for a persistent backend (Postgres, S3, cold storage). The hash chain remains the SSOT regardless of where records live.