Causal Analysis
New in v0.5.0
Use DataSynth’s causal generation capabilities for “what-if” scenario testing and counterfactual analysis in audit and risk management.
When to Use Causal Generation
Causal generation is ideal when you need to:
- Test audit scenarios: “What would happen to fraud rates if we increased the approval threshold?”
- Risk assessment: “How would revenue change if we lost our top vendor?”
- Policy evaluation: “What is the causal effect of implementing a new control?”
- Training causal ML models: Generate data with known causal structure for model validation
Setting Up a Fraud Detection SCM
# Generate causally-structured fraud detection data
datasynth-data causal generate \
--template fraud_detection \
--samples 50000 \
--seed 42 \
--output ./fraud_causal
The fraud_detection template models:
transaction_amount→approval_level(larger amounts require higher approval)transaction_amount→fraud_flag(larger amounts have higher fraud probability)vendor_risk→fraud_flag(risky vendors associated with more fraud)
Running Interventions
Answer “what if?” questions by forcing variables to specific values:
# What if all transactions were $50,000?
datasynth-data causal intervene \
--template fraud_detection \
--variable transaction_amount \
--value 50000 \
--samples 10000 \
--output ./intervention_50k
# What if vendor risk were always high (0.9)?
datasynth-data causal intervene \
--template fraud_detection \
--variable vendor_risk \
--value 0.9 \
--samples 10000 \
--output ./intervention_high_risk
Compare the intervention output against the baseline to estimate causal effects.
Counterfactual Analysis for Audit
For individual transaction review:
from datasynth_py import DataSynth
synth = DataSynth()
# Load a specific flagged transaction
factual = {
"transaction_amount": 5000.0,
"approval_level": 1.0,
"vendor_risk": 0.3,
"fraud_flag": 0.0,
}
# What would have happened if the amount were 10x larger?
# The counterfactual preserves the same "noise" (latent factors)
# but propagates the new amount through the causal structure
This helps auditors understand which factors most influence risk assessments.
Configuration Example
global:
seed: 42
industry: manufacturing
start_date: 2024-01-01
period_months: 12
causal:
enabled: true
template: "fraud_detection"
sample_size: 50000
validate: true
# Combine with regular generation
transactions:
target_count: 100000
fraud:
enabled: true
fraud_rate: 0.005
Validating Causal Structure
Verify that generated data preserves the intended causal relationships:
datasynth-data causal validate \
--data ./fraud_causal \
--template fraud_detection
The validator checks:
- Parent-child correlations match expected directions
- Independence constraints hold for non-adjacent variables
- Intervention effects are consistent with the graph