Skip to main content
TinyCloud nodes expose health endpoints, Prometheus metrics, and structured logging for production observability.

Health Check

The node exposes a health endpoint that returns 200 OK when the server is running and ready to accept requests.
curl http://localhost:8000/healthz
Use this endpoint for:
  • Load balancer health checks — route traffic only to healthy nodes
  • Docker/Kubernetes health probes — automatically restart unhealthy containers
  • Uptime monitoring — alert when the node goes down

Docker Health Check

services:
  tinycloud:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/healthz"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s

Kubernetes Liveness Probe

livenessProbe:
  httpGet:
    path: /healthz
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /healthz
    port: 8000
  initialDelaySeconds: 5
  periodSeconds: 5

Prometheus Metrics

TinyCloud exposes Prometheus-format metrics on a dedicated port (default 8001).

Configuration

[prometheus]
port = 8001

Scrape Configuration

Add TinyCloud to your Prometheus prometheus.yml:
scrape_configs:
  - job_name: "tinycloud"
    static_configs:
      - targets: ["tinycloud:8001"]
    scrape_interval: 15s

Available Metrics

MetricTypeDescription
tinycloud_authorized_invoke_duration_secondsHistogramDuration of authorized API invocations
tinycloud_authorization_duration_secondsHistogramDuration of the authorization/verification step
These metrics include labels for detailed breakdowns and can be used to build dashboards tracking:
  • API response latency (p50, p95, p99)
  • Authorization overhead
  • Request throughput

Example PromQL Queries

# Average API latency over 5 minutes
rate(tinycloud_authorized_invoke_duration_seconds_sum[5m])
  / rate(tinycloud_authorized_invoke_duration_seconds_count[5m])

# 95th percentile API latency
histogram_quantile(0.95,
  rate(tinycloud_authorized_invoke_duration_seconds_bucket[5m])
)

# Authorization overhead as percentage of total request time
rate(tinycloud_authorization_duration_seconds_sum[5m])
  / rate(tinycloud_authorized_invoke_duration_seconds_sum[5m])
  * 100
Do not expose the Prometheus metrics port (8001) publicly. It should only be accessible from your monitoring infrastructure.

Tracing

TinyCloud supports distributed tracing for debugging request flows across services.

OpenTelemetry

Send traces to any OpenTelemetry-compatible collector:
[logging]
tracing = "OpenTelemetry"
Configure the collector endpoint via the standard OpenTelemetry environment variable:
OTEL_EXPORTER_OTLP_ENDPOINT="http://otel-collector:4317"

Jaeger

Send traces directly to a Jaeger instance:
[logging]
tracing = "Jaeger"
# Jaeger agent endpoint
OTEL_EXPORTER_JAEGER_AGENT_HOST="jaeger"
OTEL_EXPORTER_JAEGER_AGENT_PORT="6831"

Docker Compose with Jaeger

services:
  tinycloud:
    environment:
      TINYCLOUD_LOGGING__TRACING: "Jaeger"
      OTEL_EXPORTER_JAEGER_AGENT_HOST: "jaeger"
    depends_on:
      - jaeger

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # Jaeger UI
      - "6831:6831/udp"  # Jaeger agent
    restart: unless-stopped
Access the Jaeger UI at http://localhost:16686 to inspect traces.

Structured Logging

Log Format

Human-readable logs for local development:
[logging]
format = "Text"
2026-03-07T10:30:00.000Z  INFO tinycloud::server: Starting server on 0.0.0.0:8000
2026-03-07T10:30:00.050Z  INFO tinycloud::storage: Connected to PostgreSQL
2026-03-07T10:30:00.100Z  INFO tinycloud::server: Server ready
2026-03-07T10:30:01.234Z  INFO tinycloud::api: POST /v1/kv/put space=0x1234...abcd-1-default key=greeting status=200 duration=12ms

Log Levels

LevelDescriptionUse Case
errorErrors that need attentionProduction (minimum)
warnWarning conditionsProduction (recommended)
infoInformational messagesProduction (default)
debugDetailed debugging infoDevelopment, troubleshooting
traceVery detailed tracingDeep debugging only
log_level = "info"
Use info level in production for a good balance of visibility without excessive log volume. Temporarily switch to debug when troubleshooting specific issues.
For a production monitoring setup:

Prometheus + Grafana

Collect metrics from the Prometheus endpoint and build dashboards in Grafana.

Jaeger or Tempo

Distributed tracing for request flow visibility and latency analysis.

Loki or Datadog

Aggregate structured JSON logs for search and alerting.

Alertmanager

Alert on high latency, error rates, or node health failures.