Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Global Settings

Global settings control overall generation behavior.

Configuration

global:
  seed: 42                           # Random seed for reproducibility
  industry: manufacturing            # Industry preset
  start_date: 2024-01-01             # Generation start date
  period_months: 12                  # Duration in months
  group_currency: USD                # Base/reporting currency
  worker_threads: 4                  # Parallel workers (optional)
  memory_limit: 2147483648           # Memory limit in bytes (optional)

Fields

seed

Random number generator seed for reproducible output.

PropertyValue
Typeu64
RequiredNo
DefaultRandom
global:
  seed: 42  # Same seed = same output

Use cases:

  • Reproducible test datasets
  • Debugging
  • Consistent benchmarks

industry

Industry preset for domain-specific settings.

PropertyValue
Typestring
RequiredYes
ValuesSee below

Available industries:

IndustryDescription
manufacturingProduction, inventory, cost accounting
retailHigh volume sales, seasonal patterns
financial_servicesComplex IC, regulatory compliance
healthcareInsurance billing, compliance
technologySaaS revenue, R&D
energyLong-term assets, commodity trading
telecomSubscription revenue, network assets
transportationFleet assets, fuel costs
hospitalitySeasonal, revenue management

start_date

Beginning date for generated data.

PropertyValue
Typedate (YYYY-MM-DD)
RequiredYes
global:
  start_date: 2024-01-01

Notes:

  • First transaction will be on or after this date
  • Combined with period_months to determine date range

period_months

Duration of generation period.

PropertyValue
Typeu32
RequiredYes
Range1-120
global:
  period_months: 12    # One year
  period_months: 36    # Three years
  period_months: 1     # One month

Considerations:

  • Longer periods = more data
  • Period close features require at least 1 month
  • Year-end close requires at least 12 months

group_currency

Base currency for consolidation and reporting.

PropertyValue
Typestring (ISO 4217)
RequiredYes
global:
  group_currency: USD
  group_currency: EUR
  group_currency: CHF

Used for:

  • Currency translation
  • Consolidation
  • Intercompany eliminations

worker_threads

Number of parallel worker threads.

PropertyValue
Typeusize
RequiredNo
DefaultNumber of CPU cores
global:
  worker_threads: 4    # Use 4 threads
  worker_threads: 1    # Single-threaded

Guidance:

  • Default (CPU cores) is usually optimal
  • Reduce for memory-constrained systems
  • Increase may not improve performance beyond CPU cores

memory_limit

Maximum memory usage in bytes.

PropertyValue
Typeu64
RequiredNo
DefaultNone (system limit)
global:
  memory_limit: 1073741824    # 1 GB
  memory_limit: 2147483648    # 2 GB
  memory_limit: 4294967296    # 4 GB

Behavior:

  • Soft limit: Generation slows down
  • Hard limit: Generation pauses until memory freed
  • Streaming output to reduce memory pressure

Environment Variable Overrides

VariableSetting
SYNTH_DATA_SEEDglobal.seed
SYNTH_DATA_THREADSglobal.worker_threads
SYNTH_DATA_MEMORY_LIMITglobal.memory_limit
SYNTH_DATA_SEED=12345 datasynth-data generate --config config.yaml --output ./output

Examples

Minimal

global:
  industry: manufacturing
  start_date: 2024-01-01
  period_months: 12
  group_currency: USD

Full Control

global:
  seed: 42
  industry: financial_services
  start_date: 2023-01-01
  period_months: 36
  group_currency: USD
  worker_threads: 8
  memory_limit: 8589934592  # 8 GB

Development/Testing

global:
  seed: 42                # Reproducible
  industry: manufacturing
  start_date: 2024-01-01
  period_months: 1        # Short period
  group_currency: USD
  worker_threads: 1       # Single thread for debugging

Validation

CheckRule
period_months1 ≤ value ≤ 120
start_dateValid date
industryKnown industry preset
group_currencyValid ISO 4217 code

See Also