Docker Deployment

This guide walks through building, configuring, and running DataSynth as Docker containers.

Prerequisites

Docker Engine 24+ (or Docker Desktop 4.25+)
Docker Compose v2
2 GB RAM minimum (4 GB recommended)
10 GB disk for images and generated data

Images

DataSynth provides two container images:

Image	Dockerfile	Purpose
`datasynth/datasynth-server`	`Dockerfile`	Server (REST + gRPC + WebSocket)
`datasynth/datasynth-cli`	`Dockerfile.cli`	CLI for batch generation jobs

Multi-Stage Build Walkthrough

The server Dockerfile uses a four-stage build with cargo-chef for dependency caching:

Stage 1: chef       -- installs cargo-chef on rust:1.88-bookworm
Stage 2: planner    -- computes recipe.json from Cargo.lock
Stage 3: builder    -- cooks dependencies (cached), then builds datasynth-server + datasynth-data
Stage 4: runtime    -- copies binaries into gcr.io/distroless/cc-debian12

Build the server image:

docker build -t datasynth/datasynth-server:0.5.0 .

Build the CLI-only image:

docker build -t datasynth/datasynth-cli:0.5.0 -f Dockerfile.cli .

Build Arguments and Features

To enable optional features (TLS, Redis rate limiting, OpenTelemetry), modify the build command in the builder stage. For example, to enable Redis:

# In the builder stage, replace the cargo build line:
RUN cargo build --release -p datasynth-server -p datasynth-cli --features redis

Image Size

The distroless runtime image is approximately 40-60 MB. The build cache layer with cooked dependencies significantly speeds up rebuilds when only application code changes.

Docker Compose Stack

The repository includes a production-ready docker-compose.yml with the full observability stack:

services:
  datasynth-server:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "50051:50051"  # gRPC
      - "3000:3000"    # REST
    environment:
      - RUST_LOG=info
      - DATASYNTH_API_KEYS=${DATASYNTH_API_KEYS:-}
    healthcheck:
      test: ["CMD", "/usr/local/bin/datasynth-data", "--help"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: "2.0"
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    profiles:
      - redis
    ports:
      - "6379:6379"
    command: >
      redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
    deploy:
      resources:
        limits:
          memory: 256M
          cpus: "0.5"
    volumes:
      - redis-data:/data
    restart: unless-stopped

  prometheus:
    image: prom/prometheus:v2.51.0
    ports:
      - "9090:9090"
    volumes:
      - ./deploy/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./deploy/prometheus-alerts.yml:/etc/prometheus/alerts.yml:ro
      - prometheus-data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.retention.time=30d"
    restart: unless-stopped

  grafana:
    image: grafana/grafana:10.4.0
    ports:
      - "3001:3000"
    volumes:
      - ./deploy/grafana/provisioning:/etc/grafana/provisioning:ro
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-admin}
      - GF_USERS_ALLOW_SIGN_UP=false
    restart: unless-stopped

volumes:
  prometheus-data:
  grafana-data:
  redis-data:

Starting the Stack

Basic server only:

docker compose up -d datasynth-server

Full observability stack (server + Prometheus + Grafana):

docker compose up -d

With Redis for distributed rate limiting:

docker compose --profile redis up -d

Verifying the Deployment

# Health check
curl http://localhost:3000/health

# Readiness probe
curl http://localhost:3000/ready

# Prometheus metrics
curl http://localhost:3000/metrics

# Grafana UI
open http://localhost:3001  # admin / admin (or GRAFANA_PASSWORD)

Environment Variables

Variable	Default	Description
`RUST_LOG`	`info`	Log level: `trace`, `debug`, `info`, `warn`, `error`
`DATASYNTH_API_KEYS`	(none)	Comma-separated API keys for authentication
`DATASYNTH_WORKER_THREADS`	`0` (auto)	Tokio worker threads; 0 = CPU count
`DATASYNTH_REDIS_URL`	(none)	Redis URL for distributed rate limiting
`DATASYNTH_TLS_CERT`	(none)	Path to TLS certificate (PEM)
`DATASYNTH_TLS_KEY`	(none)	Path to TLS private key (PEM)
`OTEL_EXPORTER_OTLP_ENDPOINT`	(none)	OpenTelemetry collector endpoint
`OTEL_SERVICE_NAME`	(none)	OpenTelemetry service name

Resource Limits

Recommended container resource limits by workload:

Workload	CPU	Memory	Notes
Light (dev/test)	1 core	1 GB	Small configs, < 10K entries
Medium (staging)	2 cores	2 GB	Medium configs, up to 100K entries
Heavy (production)	4 cores	4 GB	Large configs, streaming, multiple clients
Batch CLI job	2-8 cores	2-8 GB	Scales linearly with core count

Running CLI Jobs in Docker

Generate data with the CLI image:

docker run --rm \
  -v $(pwd)/output:/output \
  datasynth/datasynth-cli:0.5.0 \
  generate --demo --output /output

Generate from a custom config:

docker run --rm \
  -v $(pwd)/config.yaml:/config.yaml:ro \
  -v $(pwd)/output:/output \
  datasynth/datasynth-cli:0.5.0 \
  generate --config /config.yaml --output /output

Networking

The server binds to 0.0.0.0 by default inside the container. Port mapping:

Container Port	Protocol	Service
3000	TCP	REST API + WebSocket + Prometheus metrics
50051	TCP	gRPC API

For WebSocket connections through a reverse proxy, ensure the proxy supports HTTP Upgrade headers. See TLS & Reverse Proxy for Nginx and Envoy configurations.

Logging

DataSynth server outputs structured JSON logs to stdout, which integrates with Docker’s logging drivers:

# View logs
docker compose logs -f datasynth-server

# Filter by level
docker compose logs datasynth-server | jq 'select(.level == "ERROR")'

To change the log format or level, set the RUST_LOG environment variable:

# Debug logging for the server crate only
RUST_LOG=datasynth_server=debug docker compose up -d datasynth-server

Keyboard shortcuts

SyntheticData Documentation