Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

datasynth-config

Configuration schema, validation, and industry presets for synthetic data generation.

Overview

datasynth-config provides the configuration layer for SyntheticData:

  • Schema Definition: Complete YAML configuration schema
  • Validation: Bounds checking, constraint validation, distribution sum verification
  • Industry Presets: Pre-configured settings for common industries
  • Complexity Levels: Small, medium, and large organization profiles

Configuration Sections

SectionDescription
globalIndustry, dates, seed, performance settings
companiesCompany codes, currencies, volume weights
chart_of_accountsCOA complexity and structure
transactionsLine items, amounts, sources, temporal patterns
master_dataVendors, customers, materials, assets, employees
document_flowsP2P, O2C configuration
intercompanyIC transaction types and transfer pricing
balanceOpening balances, trial balance generation
subledgerAR, AP, FA, inventory settings
fxCurrency and exchange rate settings
period_closeClose tasks and schedules
fraudFraud injection rates and types
internal_controlsSOX controls and SoD rules
anomaly_injectionAnomaly rates and labeling
data_qualityMissing values, typos, duplicates
graph_exportML graph export formats
outputOutput format and compression

Industry Presets

IndustryDescription
manufacturingHeavy P2P, inventory, fixed assets
retailHigh O2C volume, seasonal patterns
financial_servicesComplex intercompany, high controls
healthcareRegulatory focus, seasonal insurance
technologySaaS revenue patterns, R&D capitalization

Key Types

Config

#![allow(unused)]
fn main() {
pub struct Config {
    pub global: GlobalConfig,
    pub companies: Vec<CompanyConfig>,
    pub chart_of_accounts: CoaConfig,
    pub transactions: TransactionConfig,
    pub master_data: MasterDataConfig,
    pub document_flows: DocumentFlowConfig,
    pub intercompany: IntercompanyConfig,
    pub balance: BalanceConfig,
    pub subledger: SubledgerConfig,
    pub fx: FxConfig,
    pub period_close: PeriodCloseConfig,
    pub fraud: FraudConfig,
    pub internal_controls: ControlConfig,
    pub anomaly_injection: AnomalyConfig,
    pub data_quality: DataQualityConfig,
    pub graph_export: GraphExportConfig,
    pub output: OutputConfig,
}
}

GlobalConfig

#![allow(unused)]
fn main() {
pub struct GlobalConfig {
    pub seed: Option<u64>,
    pub industry: Industry,
    pub start_date: NaiveDate,
    pub period_months: u32,      // 1-120
    pub group_currency: String,
    pub worker_threads: Option<usize>,
    pub memory_limit: Option<u64>,
}
}

CompanyConfig

#![allow(unused)]
fn main() {
pub struct CompanyConfig {
    pub code: String,
    pub name: String,
    pub currency: String,
    pub country: String,
    pub volume_weight: f64,     // Must sum to 1.0 across companies
    pub is_parent: bool,
    pub parent_code: Option<String>,
}
}

Validation Rules

The ConfigValidator enforces:

RuleConstraint
period_months1-120 (max 10 years)
compression_level1-9 when compression enabled
Rate fields0.0-1.0
Approval thresholdsStrictly ascending order
Distribution weightsSum to 1.0 (±0.01 tolerance)
Company codesUnique within configuration
Datesstart_date + period_months is valid

Usage Examples

Loading Configuration

#![allow(unused)]
fn main() {
use synth_config::{Config, ConfigValidator};

// From YAML file
let config = Config::from_yaml_file("config.yaml")?;

// Validate
let validator = ConfigValidator::new();
validator.validate(&config)?;
}

Using Presets

#![allow(unused)]
fn main() {
use synth_config::{Config, Industry, Complexity};

// Create from preset
let config = Config::from_preset(Industry::Manufacturing, Complexity::Medium);

// Modify as needed
config.transactions.target_count = 50000;
}

Creating Configuration Programmatically

#![allow(unused)]
fn main() {
use synth_config::{Config, GlobalConfig, TransactionConfig};

let config = Config {
    global: GlobalConfig {
        seed: Some(42),
        industry: Industry::Manufacturing,
        start_date: NaiveDate::from_ymd_opt(2024, 1, 1).unwrap(),
        period_months: 12,
        group_currency: "USD".to_string(),
        ..Default::default()
    },
    transactions: TransactionConfig {
        target_count: 100000,
        ..Default::default()
    },
    ..Default::default()
};
}

Saving Configuration

#![allow(unused)]
fn main() {
// To YAML
config.to_yaml_file("config.yaml")?;

// To JSON
config.to_json_file("config.json")?;

// To string
let yaml = config.to_yaml_string()?;
}

Configuration Examples

Minimal Configuration

global:
  industry: manufacturing
  start_date: 2024-01-01
  period_months: 12

transactions:
  target_count: 10000

output:
  format: csv

Full Configuration

See the YAML Schema Reference for complete documentation.

Complexity Levels

LevelAccountsVendorsCustomersMaterials
small~10050100200
medium~4002005001000
large~25001000500010000

Validation Error Types

#![allow(unused)]
fn main() {
pub enum ConfigError {
    MissingRequiredField(String),
    InvalidValue { field: String, value: String, constraint: String },
    DistributionSumError { field: String, sum: f64 },
    DuplicateCode { field: String, code: String },
    DateRangeError { start: NaiveDate, end: NaiveDate },
    ParseError(String),
}
}

See Also