TinyCloud nodes expose health endpoints, Prometheus metrics, and structured logging for production observability.
Health Check
The node exposes a health endpoint that returns 200 OK when the server is running and ready to accept requests.
curl http://localhost:8000/healthz
Use this endpoint for:
Load balancer health checks — route traffic only to healthy nodes
Docker/Kubernetes health probes — automatically restart unhealthy containers
Uptime monitoring — alert when the node goes down
Docker Health Check
services :
tinycloud :
healthcheck :
test : [ "CMD" , "curl" , "-f" , "http://localhost:8000/healthz" ]
interval : 10s
timeout : 5s
retries : 5
start_period : 10s
Kubernetes Liveness Probe
livenessProbe :
httpGet :
path : /healthz
port : 8000
initialDelaySeconds : 10
periodSeconds : 10
readinessProbe :
httpGet :
path : /healthz
port : 8000
initialDelaySeconds : 5
periodSeconds : 5
Prometheus Metrics
TinyCloud exposes Prometheus-format metrics on a dedicated port (default 8001).
Configuration
Scrape Configuration
Add TinyCloud to your Prometheus prometheus.yml:
scrape_configs :
- job_name : "tinycloud"
static_configs :
- targets : [ "tinycloud:8001" ]
scrape_interval : 15s
Available Metrics
Metric Type Description tinycloud_authorized_invoke_duration_secondsHistogram Duration of authorized API invocations tinycloud_authorization_duration_secondsHistogram Duration of the authorization/verification step
These metrics include labels for detailed breakdowns and can be used to build dashboards tracking:
API response latency (p50, p95, p99)
Authorization overhead
Request throughput
Example PromQL Queries
# Average API latency over 5 minutes
rate(tinycloud_authorized_invoke_duration_seconds_sum[5m])
/ rate(tinycloud_authorized_invoke_duration_seconds_count[5m])
# 95th percentile API latency
histogram_quantile(0.95,
rate(tinycloud_authorized_invoke_duration_seconds_bucket[5m])
)
# Authorization overhead as percentage of total request time
rate(tinycloud_authorization_duration_seconds_sum[5m])
/ rate(tinycloud_authorized_invoke_duration_seconds_sum[5m])
* 100
Do not expose the Prometheus metrics port (8001) publicly. It should only be accessible from your monitoring infrastructure.
Tracing
TinyCloud supports distributed tracing for debugging request flows across services.
OpenTelemetry
Send traces to any OpenTelemetry-compatible collector:
[ logging ]
tracing = "OpenTelemetry"
Configure the collector endpoint via the standard OpenTelemetry environment variable:
OTEL_EXPORTER_OTLP_ENDPOINT = "http://otel-collector:4317"
Jaeger
Send traces directly to a Jaeger instance:
[ logging ]
tracing = "Jaeger"
# Jaeger agent endpoint
OTEL_EXPORTER_JAEGER_AGENT_HOST = "jaeger"
OTEL_EXPORTER_JAEGER_AGENT_PORT = "6831"
Docker Compose with Jaeger
services :
tinycloud :
environment :
TINYCLOUD_LOGGING__TRACING : "Jaeger"
OTEL_EXPORTER_JAEGER_AGENT_HOST : "jaeger"
depends_on :
- jaeger
jaeger :
image : jaegertracing/all-in-one:latest
ports :
- "16686:16686" # Jaeger UI
- "6831:6831/udp" # Jaeger agent
restart : unless-stopped
Access the Jaeger UI at http://localhost:16686 to inspect traces.
Structured Logging
Text (development)
JSON (production)
Human-readable logs for local development: [ logging ]
format = "Text"
2026-03-07T10:30:00.000Z INFO tinycloud::server: Starting server on 0.0.0.0:8000
2026-03-07T10:30:00.050Z INFO tinycloud::storage: Connected to PostgreSQL
2026-03-07T10:30:00.100Z INFO tinycloud::server: Server ready
2026-03-07T10:30:01.234Z INFO tinycloud::api: POST /v1/kv/put space=0x1234...abcd-1-default key=greeting status=200 duration=12ms
Structured JSON logs for log aggregation (Datadog, Loki, CloudWatch): [ logging ]
format = "Json"
{ "timestamp" : "2026-03-07T10:30:01.234Z" , "level" : "INFO" , "target" : "tinycloud::api" , "message" : "POST /v1/kv/put" , "space" : "0x1234...abcd-1-default" , "key" : "greeting" , "status" : 200 , "duration_ms" : 12 }
Log Levels
Level Description Use Case errorErrors that need attention Production (minimum) warnWarning conditions Production (recommended) infoInformational messages Production (default) debugDetailed debugging info Development, troubleshooting traceVery detailed tracing Deep debugging only
Use info level in production for a good balance of visibility without excessive log volume. Temporarily switch to debug when troubleshooting specific issues.
Recommended Stack
For a production monitoring setup:
Prometheus + Grafana Collect metrics from the Prometheus endpoint and build dashboards in Grafana.
Jaeger or Tempo Distributed tracing for request flow visibility and latency analysis.
Loki or Datadog Aggregate structured JSON logs for search and alerting.
Alertmanager Alert on high latency, error rates, or node health failures.