Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

datasynth-generators

Data generators for journal entries, master data, document flows, and anomalies.

Overview

datasynth-generators contains all data generation logic for SyntheticData:

  • Core Generators: Journal entries, chart of accounts, users
  • Master Data: Vendors, customers, materials, assets, employees
  • Document Flows: P2P (Procure-to-Pay), O2C (Order-to-Cash)
  • Financial: Intercompany, balance tracking, subledgers, FX, period close
  • Quality: Anomaly injection, data quality variations
  • Sourcing (S2C): Spend analysis, RFx, bids, contracts, catalogs, scorecards (v0.6.0)
  • HR / Payroll: Payroll runs, time entries, expense reports (v0.6.0)
  • Financial Reporting: Financial statements, bank reconciliation (v0.6.0)
  • Standards: Revenue recognition, impairment testing (v0.6.0)
  • Manufacturing: Production orders, quality inspections, cycle counts (v0.6.0)

Module Structure

Core Generators

GeneratorDescription
je_generatorJournal entry generation with statistical distributions
coa_generatorChart of accounts with industry-specific structures
company_selectorWeighted company selection for transactions
user_generatorUser/persona generation with roles
control_generatorInternal controls and SoD rules

Master Data (master_data/)

GeneratorDescription
vendor_generatorVendors with payment terms, bank accounts, behaviors
customer_generatorCustomers with credit ratings, payment patterns
material_generatorMaterials/products with BOM, valuations
asset_generatorFixed assets with depreciation schedules
employee_generatorEmployees with manager hierarchy
entity_registry_managerCentral entity registry with temporal validity

Document Flow (document_flow/)

GeneratorDescription
p2p_generatorPO → GR → Invoice → Payment flow
o2c_generatorSO → Delivery → Invoice → Receipt flow
document_chain_managerReference chain management
document_flow_je_generatorGenerate JEs from document flows
three_way_matchPO/GR/Invoice matching validation

Intercompany (intercompany/)

GeneratorDescription
ic_generatorMatched intercompany entry pairs
matching_engineIC matching and reconciliation
elimination_generatorConsolidation elimination entries

Balance (balance/)

GeneratorDescription
opening_balance_generatorCoherent opening balance sheet
balance_trackerRunning balance validation
trial_balance_generatorPeriod-end trial balance

Subledger (subledger/)

GeneratorDescription
ar_generatorAR invoices, receipts, credit memos, aging
ap_generatorAP invoices, payments, debit memos
fa_generatorFixed assets, depreciation, disposals
inventory_generatorInventory positions, movements, valuation
reconciliationGL-to-subledger reconciliation

FX (fx/)

GeneratorDescription
fx_rate_serviceFX rate generation (Ornstein-Uhlenbeck process)
currency_translatorTrial balance translation
cta_generatorCurrency Translation Adjustment entries

Period Close (period_close/)

GeneratorDescription
close_engineMain orchestration
accrualsAccrual entry generation
depreciationMonthly depreciation runs
year_endYear-end closing entries

Anomaly (anomaly/)

GeneratorDescription
injectorMain anomaly injection engine
typesWeighted anomaly type configurations
strategiesInjection strategies (amount, date, duplication)
patternsTemporal patterns, clustering, entity targeting

Data Quality (data_quality/)

GeneratorDescription
injectorMain data quality injector
missing_valuesMCAR, MAR, MNAR, Systematic patterns
format_variationsDate, amount, identifier formats
duplicatesExact, near, fuzzy duplicates
typosKeyboard-aware typos, OCR errors
labelsML training labels for data quality issues

Audit (audit/)

ISA-compliant audit data generation.

GeneratorDescription
engagement_generatorAudit engagement with phases (Planning, Fieldwork, Completion)
workpaper_generatorAudit workpapers per ISA 230
evidence_generatorAudit evidence per ISA 500
risk_generatorRisk assessment per ISA 315/330
finding_generatorAudit findings per ISA 265
judgment_generatorProfessional judgment documentation per ISA 200

LLM Enrichment (llm_enrichment/) — v0.5.0

GeneratorDescription
VendorLlmEnricherGenerate realistic vendor names by industry, spend category, and country
TransactionLlmEnricherGenerate transaction descriptions and memo fields
AnomalyLlmExplainerGenerate natural language explanations for injected anomalies

Sourcing (sourcing/) – v0.6.0

Source-to-Contract (S2C) procurement pipeline generators.

GeneratorDescription
spend_analysis_generatorSpend analysis records and category hierarchies
sourcing_project_generatorSourcing project lifecycle management
qualification_generatorSupplier qualification assessments
rfx_generatorRFx events (RFI/RFP/RFQ) with invited suppliers
bid_generatorSupplier bids with pricing and compliance data
bid_evaluation_generatorBid scoring, ranking, and award recommendations
contract_generatorProcurement contracts with terms and renewal rules
catalog_generatorCatalog items linked to contracts
scorecard_generatorSupplier scorecards with performance metrics

Generation DAG: spend_analysis -> sourcing_project -> qualification -> rfx -> bid -> bid_evaluation -> contract -> catalog -> [P2P] -> scorecard

HR (hr/) – v0.6.0

Hire-to-Retire (H2R) generators for the HR process chain.

GeneratorDescription
payroll_generatorPayroll runs with employee pay line items (gross, deductions, net, employer cost)
time_entry_generatorEmployee time entries with regular, overtime, PTO, and sick hours
expense_report_generatorExpense reports with categorized line items and approval workflows

Standards (standards/) – v0.6.0

Accounting and audit standards generators.

GeneratorDescription
revenue_recognition_generatorASC 606/IFRS 15 customer contracts with performance obligations
impairment_generatorAsset impairment tests with recoverable amount calculations

Period Close Additions – v0.6.0

GeneratorDescription
financial_statement_generatorBalance sheet, income statement, cash flow, and changes in equity from trial balance data

Bank Reconciliation – v0.6.0

GeneratorDescription
bank_reconciliation_generatorBank reconciliations with statement lines, auto-matching, and reconciling items

Relationships (relationships/)

GeneratorDescription
entity_graph_generatorCross-process entity relationship graphs
relationship_strengthWeighted relationship strength calculation

Audit Engagement Structure:

#![allow(unused)]
fn main() {
pub struct AuditEngagement {
    pub engagement_id: String,
    pub client_name: String,
    pub fiscal_year: u16,
    pub phase: AuditPhase,  // Planning, Fieldwork, Completion
    pub materiality: MaterialityLevels,
    pub team_size: usize,
    pub has_fraud_risk: bool,
    pub has_significant_risk: bool,
}

pub struct MaterialityLevels {
    pub primary_materiality: Decimal,        // 0.3-1% of base
    pub performance_materiality: Decimal,    // 50-75% of primary
    pub clearly_trivial: Decimal,            // 3-5% of primary
}
}

Usage Examples

Journal Entry Generation

#![allow(unused)]
fn main() {
use synth_generators::je_generator::JournalEntryGenerator;

let mut generator = JournalEntryGenerator::new(config, seed);

// Generate batch
let entries = generator.generate_batch(1000)?;

// Stream generation
for entry in generator.generate_stream().take(1000) {
    process(entry?);
}
}

Master Data Generation

#![allow(unused)]
fn main() {
use synth_generators::master_data::{VendorGenerator, CustomerGenerator};

let mut vendor_gen = VendorGenerator::new(seed);
let vendors = vendor_gen.generate(100);

let mut customer_gen = CustomerGenerator::new(seed);
let customers = customer_gen.generate(200);
}

Document Flow Generation

#![allow(unused)]
fn main() {
use synth_generators::document_flow::{P2pGenerator, O2cGenerator};

let mut p2p = P2pGenerator::new(config, seed);
let p2p_flows = p2p.generate_batch(500)?;

let mut o2c = O2cGenerator::new(config, seed);
let o2c_flows = o2c.generate_batch(500)?;
}

Anomaly Injection

#![allow(unused)]
fn main() {
use synth_generators::anomaly::AnomalyInjector;

let mut injector = AnomalyInjector::new(config.anomaly_injection, seed);

// Inject into existing entries
let (modified_entries, labels) = injector.inject(&entries)?;
}

LLM Enrichment

#![allow(unused)]
fn main() {
use synth_generators::llm_enrichment::{VendorLlmEnricher, TransactionLlmEnricher};
use synth_core::llm::MockLlmProvider;
use std::sync::Arc;

let provider = Arc::new(MockLlmProvider::new(42));

// Enrich vendor names
let vendor_enricher = VendorLlmEnricher::new(provider.clone());
let name = vendor_enricher.enrich_vendor_name("manufacturing", "raw_materials", "US")?;

// Enrich transaction descriptions
let tx_enricher = TransactionLlmEnricher::new(provider);
let desc = tx_enricher.enrich_description("Office Supplies", "1000-5000", "retail", 3)?;
let memo = tx_enricher.enrich_memo("VendorInvoice", "Acme Corp", "2500.00")?;
}

Three-Way Match

The P2P generator validates document matching:

#![allow(unused)]
fn main() {
use synth_generators::document_flow::ThreeWayMatch;

let match_result = ThreeWayMatch::validate(
    &purchase_order,
    &goods_receipt,
    &vendor_invoice,
    tolerance_config,
);

match match_result {
    MatchResult::Passed => { /* Process normally */ }
    MatchResult::QuantityVariance(var) => { /* Handle variance */ }
    MatchResult::PriceVariance(var) => { /* Handle variance */ }
}
}

Balance Coherence

The balance tracker maintains accounting equation:

#![allow(unused)]
fn main() {
use synth_generators::balance::BalanceTracker;

let mut tracker = BalanceTracker::new();

for entry in &entries {
    tracker.post(&entry)?;
}

// Verify Assets = Liabilities + Equity
assert!(tracker.is_balanced());
}

FX Rate Generation

Uses Ornstein-Uhlenbeck process for realistic rate movements:

#![allow(unused)]
fn main() {
use synth_generators::fx::FxRateService;

let mut fx_service = FxRateService::new(config.fx, seed);

// Get rate for date
let rate = fx_service.get_rate("EUR", "USD", date)?;

// Generate daily rates
let rates = fx_service.generate_daily_rates(start, end)?;
}

Anomaly Types

Fraud Types

  • FictitiousTransaction, RevenueManipulation, ExpenseCapitalization
  • SplitTransaction, RoundTripping, KickbackScheme
  • GhostEmployee, DuplicatePayment, UnauthorizedDiscount

Error Types

  • DuplicateEntry, ReversedAmount, WrongPeriod
  • WrongAccount, MissingReference, IncorrectTaxCode

Process Issues

  • LatePosting, SkippedApproval, ThresholdManipulation
  • MissingDocumentation, OutOfSequence

Statistical Anomalies

  • UnusualAmount, TrendBreak, BenfordViolation, OutlierValue

Relational Anomalies

  • CircularTransaction, DormantAccountActivity, UnusualCounterparty

See Also