Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Docker Deployment

This guide walks through building, configuring, and running DataSynth as Docker containers.

Prerequisites

  • Docker Engine 24+ (or Docker Desktop 4.25+)
  • Docker Compose v2
  • 2 GB RAM minimum (4 GB recommended)
  • 10 GB disk for images and generated data

Images

DataSynth provides two container images:

ImageDockerfilePurpose
datasynth/datasynth-serverDockerfileServer (REST + gRPC + WebSocket)
datasynth/datasynth-cliDockerfile.cliCLI for batch generation jobs

Multi-Stage Build Walkthrough

The server Dockerfile uses a four-stage build with cargo-chef for dependency caching:

Stage 1: chef       -- installs cargo-chef on rust:1.88-bookworm
Stage 2: planner    -- computes recipe.json from Cargo.lock
Stage 3: builder    -- cooks dependencies (cached), then builds datasynth-server + datasynth-data
Stage 4: runtime    -- copies binaries into gcr.io/distroless/cc-debian12

Build the server image:

docker build -t datasynth/datasynth-server:0.5.0 .

Build the CLI-only image:

docker build -t datasynth/datasynth-cli:0.5.0 -f Dockerfile.cli .

Build Arguments and Features

To enable optional features (TLS, Redis rate limiting, OpenTelemetry), modify the build command in the builder stage. For example, to enable Redis:

# In the builder stage, replace the cargo build line:
RUN cargo build --release -p datasynth-server -p datasynth-cli --features redis

Image Size

The distroless runtime image is approximately 40-60 MB. The build cache layer with cooked dependencies significantly speeds up rebuilds when only application code changes.

Docker Compose Stack

The repository includes a production-ready docker-compose.yml with the full observability stack:

services:
  datasynth-server:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "50051:50051"  # gRPC
      - "3000:3000"    # REST
    environment:
      - RUST_LOG=info
      - DATASYNTH_API_KEYS=${DATASYNTH_API_KEYS:-}
    healthcheck:
      test: ["CMD", "/usr/local/bin/datasynth-data", "--help"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: "2.0"
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    profiles:
      - redis
    ports:
      - "6379:6379"
    command: >
      redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
    deploy:
      resources:
        limits:
          memory: 256M
          cpus: "0.5"
    volumes:
      - redis-data:/data
    restart: unless-stopped

  prometheus:
    image: prom/prometheus:v2.51.0
    ports:
      - "9090:9090"
    volumes:
      - ./deploy/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./deploy/prometheus-alerts.yml:/etc/prometheus/alerts.yml:ro
      - prometheus-data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.retention.time=30d"
    restart: unless-stopped

  grafana:
    image: grafana/grafana:10.4.0
    ports:
      - "3001:3000"
    volumes:
      - ./deploy/grafana/provisioning:/etc/grafana/provisioning:ro
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-admin}
      - GF_USERS_ALLOW_SIGN_UP=false
    restart: unless-stopped

volumes:
  prometheus-data:
  grafana-data:
  redis-data:

Starting the Stack

Basic server only:

docker compose up -d datasynth-server

Full observability stack (server + Prometheus + Grafana):

docker compose up -d

With Redis for distributed rate limiting:

docker compose --profile redis up -d

Verifying the Deployment

# Health check
curl http://localhost:3000/health

# Readiness probe
curl http://localhost:3000/ready

# Prometheus metrics
curl http://localhost:3000/metrics

# Grafana UI
open http://localhost:3001  # admin / admin (or GRAFANA_PASSWORD)

Environment Variables

VariableDefaultDescription
RUST_LOGinfoLog level: trace, debug, info, warn, error
DATASYNTH_API_KEYS(none)Comma-separated API keys for authentication
DATASYNTH_WORKER_THREADS0 (auto)Tokio worker threads; 0 = CPU count
DATASYNTH_REDIS_URL(none)Redis URL for distributed rate limiting
DATASYNTH_TLS_CERT(none)Path to TLS certificate (PEM)
DATASYNTH_TLS_KEY(none)Path to TLS private key (PEM)
OTEL_EXPORTER_OTLP_ENDPOINT(none)OpenTelemetry collector endpoint
OTEL_SERVICE_NAME(none)OpenTelemetry service name

Resource Limits

Recommended container resource limits by workload:

WorkloadCPUMemoryNotes
Light (dev/test)1 core1 GBSmall configs, < 10K entries
Medium (staging)2 cores2 GBMedium configs, up to 100K entries
Heavy (production)4 cores4 GBLarge configs, streaming, multiple clients
Batch CLI job2-8 cores2-8 GBScales linearly with core count

Running CLI Jobs in Docker

Generate data with the CLI image:

docker run --rm \
  -v $(pwd)/output:/output \
  datasynth/datasynth-cli:0.5.0 \
  generate --demo --output /output

Generate from a custom config:

docker run --rm \
  -v $(pwd)/config.yaml:/config.yaml:ro \
  -v $(pwd)/output:/output \
  datasynth/datasynth-cli:0.5.0 \
  generate --config /config.yaml --output /output

Networking

The server binds to 0.0.0.0 by default inside the container. Port mapping:

Container PortProtocolService
3000TCPREST API + WebSocket + Prometheus metrics
50051TCPgRPC API

For WebSocket connections through a reverse proxy, ensure the proxy supports HTTP Upgrade headers. See TLS & Reverse Proxy for Nginx and Envoy configurations.

Logging

DataSynth server outputs structured JSON logs to stdout, which integrates with Docker’s logging drivers:

# View logs
docker compose logs -f datasynth-server

# Filter by level
docker compose logs datasynth-server | jq 'select(.level == "ERROR")'

To change the log format or level, set the RUST_LOG environment variable:

# Debug logging for the server crate only
RUST_LOG=datasynth_server=debug docker compose up -d datasynth-server