Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Advanced Topics

Advanced features for specialized use cases.

Overview

TopicDescription
Anomaly InjectionFraud, errors, and process issues
Data Quality VariationsMissing values, typos, duplicates
Graph ExportML-ready graph formats
Intercompany ProcessingMulti-entity transactions
Period Close EngineMonth/quarter/year-end processes
Performance TuningOptimization strategies

Feature Matrix

FeatureUse CaseOutput
Anomaly InjectionML trainingLabels (CSV)
Data QualityTesting robustnessVaried data
Graph ExportGNN trainingPyG, Neo4j
IntercompanyConsolidation testingIC pairs
Period CloseFull cycle testingClosing entries

Enabling Advanced Features

In Configuration

# Anomaly injection
anomaly_injection:
  enabled: true
  total_rate: 0.02
  generate_labels: true

# Data quality variations
data_quality:
  enabled: true
  missing_values:
    rate: 0.01

# Graph export
graph_export:
  enabled: true
  formats:
    - pytorch_geometric
    - neo4j

# Intercompany
intercompany:
  enabled: true

# Period close
period_close:
  enabled: true
  monthly:
    accruals: true
    depreciation: true

Via CLI

Most advanced features are controlled through configuration. Use init to create a base config, then customize:

datasynth-data init --industry manufacturing --complexity medium -o config.yaml
# Edit config.yaml to enable advanced features
datasynth-data generate --config config.yaml --output ./output

Prerequisites

Some advanced features have dependencies:

FeatureRequires
IntercompanyMultiple companies defined
Period Closeperiod_months ≥ 1
Graph ExportTransactions generated
FXMultiple currencies

Output Files

Advanced features produce additional outputs:

output/
├── labels/                      # Anomaly injection
│   ├── anomaly_labels.csv
│   ├── fraud_labels.csv
│   └── quality_issues.csv
├── graphs/                      # Graph export
│   ├── pytorch_geometric/
│   └── neo4j/
├── consolidation/               # Intercompany
│   ├── eliminations.csv
│   └── ic_pairs.csv
└── period_close/                # Period close
    ├── trial_balances/
    ├── accruals.csv
    └── closing_entries.csv

Performance Impact

FeatureImpactMitigation
Anomaly InjectionLowPost-processing
Data QualityLowPost-processing
Graph ExportMediumSeparate phase
IntercompanyMediumPer-transaction
Period CloseLowPer-period

See Also