Advanced Topics
Advanced features for specialized use cases.
Overview
| Topic | Description |
|---|---|
| Anomaly Injection | Fraud, errors, and process issues |
| Data Quality Variations | Missing values, typos, duplicates |
| Graph Export | ML-ready graph formats |
| Intercompany Processing | Multi-entity transactions |
| Period Close Engine | Month/quarter/year-end processes |
| Performance Tuning | Optimization strategies |
Feature Matrix
| Feature | Use Case | Output |
|---|---|---|
| Anomaly Injection | ML training | Labels (CSV) |
| Data Quality | Testing robustness | Varied data |
| Graph Export | GNN training | PyG, Neo4j |
| Intercompany | Consolidation testing | IC pairs |
| Period Close | Full cycle testing | Closing entries |
Enabling Advanced Features
In Configuration
# Anomaly injection
anomaly_injection:
enabled: true
total_rate: 0.02
generate_labels: true
# Data quality variations
data_quality:
enabled: true
missing_values:
rate: 0.01
# Graph export
graph_export:
enabled: true
formats:
- pytorch_geometric
- neo4j
# Intercompany
intercompany:
enabled: true
# Period close
period_close:
enabled: true
monthly:
accruals: true
depreciation: true
Via CLI
Most advanced features are controlled through configuration. Use init to create a base config, then customize:
datasynth-data init --industry manufacturing --complexity medium -o config.yaml
# Edit config.yaml to enable advanced features
datasynth-data generate --config config.yaml --output ./output
Prerequisites
Some advanced features have dependencies:
| Feature | Requires |
|---|---|
| Intercompany | Multiple companies defined |
| Period Close | period_months ≥ 1 |
| Graph Export | Transactions generated |
| FX | Multiple currencies |
Output Files
Advanced features produce additional outputs:
output/
├── labels/ # Anomaly injection
│ ├── anomaly_labels.csv
│ ├── fraud_labels.csv
│ └── quality_issues.csv
├── graphs/ # Graph export
│ ├── pytorch_geometric/
│ └── neo4j/
├── consolidation/ # Intercompany
│ ├── eliminations.csv
│ └── ic_pairs.csv
└── period_close/ # Period close
├── trial_balances/
├── accruals.csv
└── closing_entries.csv
Performance Impact
| Feature | Impact | Mitigation |
|---|---|---|
| Anomaly Injection | Low | Post-processing |
| Data Quality | Low | Post-processing |
| Graph Export | Medium | Separate phase |
| Intercompany | Medium | Per-transaction |
| Period Close | Low | Per-period |