Data generators for journal entries, master data, document flows, and anomalies.
datasynth-generators contains all data generation logic for SyntheticData:
- Core Generators: Journal entries, chart of accounts, users
- Master Data: Vendors, customers, materials, assets, employees
- Document Flows: P2P (Procure-to-Pay), O2C (Order-to-Cash)
- Financial: Intercompany, balance tracking, subledgers, FX, period close
- Quality: Anomaly injection, data quality variations
- Sourcing (S2C): Spend analysis, RFx, bids, contracts, catalogs, scorecards (v0.6.0)
- HR / Payroll: Payroll runs, time entries, expense reports (v0.6.0)
- Financial Reporting: Financial statements, bank reconciliation (v0.6.0)
- Standards: Revenue recognition, impairment testing (v0.6.0)
- Manufacturing: Production orders, quality inspections, cycle counts (v0.6.0)
| Generator | Description |
je_generator | Journal entry generation with statistical distributions |
coa_generator | Chart of accounts with industry-specific structures |
company_selector | Weighted company selection for transactions |
user_generator | User/persona generation with roles |
control_generator | Internal controls and SoD rules |
| Generator | Description |
vendor_generator | Vendors with payment terms, bank accounts, behaviors |
customer_generator | Customers with credit ratings, payment patterns |
material_generator | Materials/products with BOM, valuations |
asset_generator | Fixed assets with depreciation schedules |
employee_generator | Employees with manager hierarchy |
entity_registry_manager | Central entity registry with temporal validity |
| Generator | Description |
p2p_generator | PO → GR → Invoice → Payment flow |
o2c_generator | SO → Delivery → Invoice → Receipt flow |
document_chain_manager | Reference chain management |
document_flow_je_generator | Generate JEs from document flows |
three_way_match | PO/GR/Invoice matching validation |
| Generator | Description |
ic_generator | Matched intercompany entry pairs |
matching_engine | IC matching and reconciliation |
elimination_generator | Consolidation elimination entries |
| Generator | Description |
opening_balance_generator | Coherent opening balance sheet |
balance_tracker | Running balance validation |
trial_balance_generator | Period-end trial balance |
| Generator | Description |
ar_generator | AR invoices, receipts, credit memos, aging |
ap_generator | AP invoices, payments, debit memos |
fa_generator | Fixed assets, depreciation, disposals |
inventory_generator | Inventory positions, movements, valuation |
reconciliation | GL-to-subledger reconciliation |
| Generator | Description |
fx_rate_service | FX rate generation (Ornstein-Uhlenbeck process) |
currency_translator | Trial balance translation |
cta_generator | Currency Translation Adjustment entries |
| Generator | Description |
close_engine | Main orchestration |
accruals | Accrual entry generation |
depreciation | Monthly depreciation runs |
year_end | Year-end closing entries |
| Generator | Description |
injector | Main anomaly injection engine |
types | Weighted anomaly type configurations |
strategies | Injection strategies (amount, date, duplication) |
patterns | Temporal patterns, clustering, entity targeting |
| Generator | Description |
injector | Main data quality injector |
missing_values | MCAR, MAR, MNAR, Systematic patterns |
format_variations | Date, amount, identifier formats |
duplicates | Exact, near, fuzzy duplicates |
typos | Keyboard-aware typos, OCR errors |
labels | ML training labels for data quality issues |
ISA-compliant audit data generation.
| Generator | Description |
engagement_generator | Audit engagement with phases (Planning, Fieldwork, Completion) |
workpaper_generator | Audit workpapers per ISA 230 |
evidence_generator | Audit evidence per ISA 500 |
risk_generator | Risk assessment per ISA 315/330 |
finding_generator | Audit findings per ISA 265 |
judgment_generator | Professional judgment documentation per ISA 200 |
| Generator | Description |
VendorLlmEnricher | Generate realistic vendor names by industry, spend category, and country |
TransactionLlmEnricher | Generate transaction descriptions and memo fields |
AnomalyLlmExplainer | Generate natural language explanations for injected anomalies |
Source-to-Contract (S2C) procurement pipeline generators.
| Generator | Description |
spend_analysis_generator | Spend analysis records and category hierarchies |
sourcing_project_generator | Sourcing project lifecycle management |
qualification_generator | Supplier qualification assessments |
rfx_generator | RFx events (RFI/RFP/RFQ) with invited suppliers |
bid_generator | Supplier bids with pricing and compliance data |
bid_evaluation_generator | Bid scoring, ranking, and award recommendations |
contract_generator | Procurement contracts with terms and renewal rules |
catalog_generator | Catalog items linked to contracts |
scorecard_generator | Supplier scorecards with performance metrics |
Generation DAG: spend_analysis -> sourcing_project -> qualification -> rfx -> bid -> bid_evaluation -> contract -> catalog -> [P2P] -> scorecard
Hire-to-Retire (H2R) generators for the HR process chain.
| Generator | Description |
payroll_generator | Payroll runs with employee pay line items (gross, deductions, net, employer cost) |
time_entry_generator | Employee time entries with regular, overtime, PTO, and sick hours |
expense_report_generator | Expense reports with categorized line items and approval workflows |
Accounting and audit standards generators.
| Generator | Description |
revenue_recognition_generator | ASC 606/IFRS 15 customer contracts with performance obligations |
impairment_generator | Asset impairment tests with recoverable amount calculations |
| Generator | Description |
financial_statement_generator | Balance sheet, income statement, cash flow, and changes in equity from trial balance data |
| Generator | Description |
bank_reconciliation_generator | Bank reconciliations with statement lines, auto-matching, and reconciling items |
| Generator | Description |
entity_graph_generator | Cross-process entity relationship graphs |
relationship_strength | Weighted relationship strength calculation |
Audit Engagement Structure:
#![allow(unused)]
fn main() {
pub struct AuditEngagement {
pub engagement_id: String,
pub client_name: String,
pub fiscal_year: u16,
pub phase: AuditPhase, // Planning, Fieldwork, Completion
pub materiality: MaterialityLevels,
pub team_size: usize,
pub has_fraud_risk: bool,
pub has_significant_risk: bool,
}
pub struct MaterialityLevels {
pub primary_materiality: Decimal, // 0.3-1% of base
pub performance_materiality: Decimal, // 50-75% of primary
pub clearly_trivial: Decimal, // 3-5% of primary
}
}
#![allow(unused)]
fn main() {
use synth_generators::je_generator::JournalEntryGenerator;
let mut generator = JournalEntryGenerator::new(config, seed);
// Generate batch
let entries = generator.generate_batch(1000)?;
// Stream generation
for entry in generator.generate_stream().take(1000) {
process(entry?);
}
}
#![allow(unused)]
fn main() {
use synth_generators::master_data::{VendorGenerator, CustomerGenerator};
let mut vendor_gen = VendorGenerator::new(seed);
let vendors = vendor_gen.generate(100);
let mut customer_gen = CustomerGenerator::new(seed);
let customers = customer_gen.generate(200);
}
#![allow(unused)]
fn main() {
use synth_generators::document_flow::{P2pGenerator, O2cGenerator};
let mut p2p = P2pGenerator::new(config, seed);
let p2p_flows = p2p.generate_batch(500)?;
let mut o2c = O2cGenerator::new(config, seed);
let o2c_flows = o2c.generate_batch(500)?;
}
#![allow(unused)]
fn main() {
use synth_generators::anomaly::AnomalyInjector;
let mut injector = AnomalyInjector::new(config.anomaly_injection, seed);
// Inject into existing entries
let (modified_entries, labels) = injector.inject(&entries)?;
}
#![allow(unused)]
fn main() {
use synth_generators::llm_enrichment::{VendorLlmEnricher, TransactionLlmEnricher};
use synth_core::llm::MockLlmProvider;
use std::sync::Arc;
let provider = Arc::new(MockLlmProvider::new(42));
// Enrich vendor names
let vendor_enricher = VendorLlmEnricher::new(provider.clone());
let name = vendor_enricher.enrich_vendor_name("manufacturing", "raw_materials", "US")?;
// Enrich transaction descriptions
let tx_enricher = TransactionLlmEnricher::new(provider);
let desc = tx_enricher.enrich_description("Office Supplies", "1000-5000", "retail", 3)?;
let memo = tx_enricher.enrich_memo("VendorInvoice", "Acme Corp", "2500.00")?;
}
The P2P generator validates document matching:
#![allow(unused)]
fn main() {
use synth_generators::document_flow::ThreeWayMatch;
let match_result = ThreeWayMatch::validate(
&purchase_order,
&goods_receipt,
&vendor_invoice,
tolerance_config,
);
match match_result {
MatchResult::Passed => { /* Process normally */ }
MatchResult::QuantityVariance(var) => { /* Handle variance */ }
MatchResult::PriceVariance(var) => { /* Handle variance */ }
}
}
The balance tracker maintains accounting equation:
#![allow(unused)]
fn main() {
use synth_generators::balance::BalanceTracker;
let mut tracker = BalanceTracker::new();
for entry in &entries {
tracker.post(&entry)?;
}
// Verify Assets = Liabilities + Equity
assert!(tracker.is_balanced());
}
Uses Ornstein-Uhlenbeck process for realistic rate movements:
#![allow(unused)]
fn main() {
use synth_generators::fx::FxRateService;
let mut fx_service = FxRateService::new(config.fx, seed);
// Get rate for date
let rate = fx_service.get_rate("EUR", "USD", date)?;
// Generate daily rates
let rates = fx_service.generate_daily_rates(start, end)?;
}
- FictitiousTransaction, RevenueManipulation, ExpenseCapitalization
- SplitTransaction, RoundTripping, KickbackScheme
- GhostEmployee, DuplicatePayment, UnauthorizedDiscount
- DuplicateEntry, ReversedAmount, WrongPeriod
- WrongAccount, MissingReference, IncorrectTaxCode
- LatePosting, SkippedApproval, ThresholdManipulation
- MissingDocumentation, OutOfSequence
- UnusualAmount, TrendBreak, BenfordViolation, OutlierValue
- CircularTransaction, DormantAccountActivity, UnusualCounterparty