EU AI Act Compliance
DataSynth implements technical controls aligned with the EU Artificial Intelligence Act (Regulation 2024/1689), focusing on Article 50 (transparency for synthetic content) and Article 10 (data governance for high-risk AI systems).
Article 50 — Synthetic Content Marking
Article 50(2) requires that providers of AI systems generating synthetic content shall ensure outputs are marked in a machine-readable format and detectable as artificially generated.
How DataSynth Complies
DataSynth embeds machine-readable synthetic content credentials in all output files:
- CSV: Comment header lines with C2PA-inspired metadata
- JSON:
_synthetic_metadatatop-level object with credential fields - Parquet: Key-value metadata pairs in the file footer
Configuration
compliance:
content_marking:
enabled: true # Default: true
format: embedded # embedded, sidecar, or both
article10_report: true # Generate Article 10 governance report
Marking Formats
| Format | Description |
|---|---|
embedded | Credentials embedded directly in output files (default) |
sidecar | Separate .synthetic-credential.json file alongside each output |
both | Both embedded and sidecar credentials |
Credential Fields
Each synthetic content credential contains:
| Field | Description | Example |
|---|---|---|
generator | Tool identifier | "DataSynth" |
version | Generator version | "0.5.0" |
timestamp | ISO 8601 generation time | "2024-06-15T10:30:00Z" |
content_type | Output category | "synthetic_financial_data" |
method | Generation technique | "rule_based_statistical" |
config_hash | SHA-256 of config used | "a1b2c3..." |
declaration | Human-readable statement | "This content is synthetic..." |
Programmatic Detection
Third-party systems can detect synthetic DataSynth output by:
- CSV: Checking for
# X-Synthetic-Generator: DataSynthheader lines - JSON: Checking for
_synthetic_metadata.generator == "DataSynth" - Parquet: Reading
synthetic_generatorfrom file metadata
Article 10 — Data Governance
Article 10 requires appropriate data governance practices for training datasets used by high-risk AI systems. When synthetic data from DataSynth is used to train such systems, the Article 10 data governance report provides documentation.
Governance Report Contents
The automated report includes:
- Data Sources: Documentation of all inputs (configuration parameters, seed values, statistical distributions)
- Processing Steps: Complete pipeline documentation (CoA generation, master data, document flows, anomaly injection, quality validation)
- Quality Measures: Statistical validation results (Benford’s Law, balance coherence, distribution fitting)
- Bias Assessment: Known limitations, demographic representation gaps, and mitigation measures
Generating the Report
Enable in configuration:
compliance:
article10_report: true
The report is written as article10_governance_report.json in the output directory.
Report Structure
{
"report_version": "1.0",
"generator": "DataSynth",
"generated_at": "2024-06-15T10:30:00Z",
"data_sources": ["configuration_parameters", "statistical_distributions", "deterministic_rng"],
"processing_steps": [
"chart_of_accounts_generation",
"master_data_generation",
"document_flow_generation",
"journal_entry_generation",
"anomaly_injection",
"quality_validation"
],
"quality_measures": [
"benfords_law_compliance",
"balance_sheet_coherence",
"document_chain_integrity",
"referential_integrity"
],
"bias_assessment": {
"known_limitations": [
"Statistical distributions are parameterized, not learned from real data",
"Temporal patterns use simplified seasonal models"
],
"mitigation_measures": [
"Configurable distribution parameters per industry profile",
"Quality gate validation ensures statistical plausibility"
]
}
}