Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GDPR Compliance

This document provides GDPR (General Data Protection Regulation) compliance guidance for DataSynth deployments. DataSynth generates purely synthetic data by default, but certain workflows (fingerprint extraction) may process real personal data.

Synthetic Data and GDPR

Pure Synthetic Generation

When DataSynth generates data from configuration alone (no real data input):

  • No personal data is processed: All names, identifiers, and transactions are algorithmically generated
  • No data subjects exist: Synthetic entities have no real-world counterparts
  • GDPR does not apply to the generated output, as it contains no personal data per Article 4(1)

This is the default operating mode for all datasynth-data generate workflows.

Fingerprint Extraction Workflows

When using datasynth-data fingerprint extract with real data as input:

  • Real personal data may be processed during statistical fingerprint extraction
  • GDPR obligations apply to the extraction phase
  • Differential privacy controls limit information retained in the fingerprint
  • The output fingerprint (.dsf file) contains only aggregate statistics, not individual records

Article 30 — Records of Processing Activities

Template for Pure Synthetic Generation

FieldValue
PurposeGeneration of synthetic financial data for testing, training, and validation
Categories of data subjectsNone (no real data subjects)
Categories of personal dataNone (all data is synthetic)
RecipientsInternal development, QA, and data science teams
Transfers to third countriesNot applicable (no personal data)
Retention periodPer project requirements
Technical measuresSeed-based deterministic generation, content marking

Template for Fingerprint Extraction

FieldValue
PurposeStatistical fingerprint extraction for privacy-preserving data synthesis
Legal basisLegitimate interest (Article 6(1)(f)) or consent
Categories of data subjectsAs per source dataset (e.g., customers, vendors, employees)
Categories of personal dataAs per source dataset (aggregate statistics only retained)
RecipientsData engineering team operating DataSynth
Transfers to third countriesAssess per deployment topology
Retention periodFingerprint files: per project; source data: minimize retention
Technical measuresDifferential privacy (configurable epsilon/delta), k-anonymity

Data Protection Impact Assessment (DPIA)

A DPIA under Article 35 is recommended when fingerprint extraction processes:

  • Large-scale datasets (>100,000 records)
  • Special categories of data (Article 9)
  • Data relating to vulnerable persons

DPIA Template for Fingerprint Extraction

1. Description of Processing

DataSynth extracts statistical fingerprints from source data. The fingerprint captures distribution parameters (means, variances, correlations) without retaining individual records. Differential privacy noise is added with configurable epsilon/delta parameters.

2. Necessity and Proportionality

  • Purpose: Enable realistic synthetic data generation without accessing source data repeatedly
  • Minimization: Only aggregate statistics are retained
  • Privacy controls: Differential privacy with user-specified budget

3. Risks to Data Subjects

RiskLikelihoodSeverityMitigation
Re-identification from fingerprintLowHighDifferential privacy, k-anonymity enforcement
Membership inferenceLowMediumMIA AUC-ROC testing in evaluation framework
Fingerprint file compromiseMediumLowAggregate statistics only, no individual records

4. Measures to Address Risks

  • Configure fingerprint_privacy.level: high or maximum for sensitive data
  • Set fingerprint_privacy.epsilon to 0.1-1.0 range (lower = stronger privacy)
  • Enable k-anonymity with fingerprint_privacy.k_anonymity >= 5
  • Use evaluation framework MIA testing to verify privacy guarantees

Privacy Configuration

fingerprint_privacy:
  level: high             # minimal, standard, high, maximum, custom
  epsilon: 0.5            # Privacy budget (lower = stronger)
  delta: 1.0e-5           # Failure probability
  k_anonymity: 10         # Minimum group size
  composition_method: renyi_dp  # naive, advanced, renyi_dp, zcdp

Privacy Level Presets

LevelEpsilonDeltak-AnonymityUse Case
minimal10.01e-32Non-sensitive aggregates
standard1.01e-55General business data
high0.51e-610Sensitive financial data
maximum0.11e-820Regulated personal data

Data Subject Rights

Pure Synthetic Mode

Articles 15-22 (access, rectification, erasure, etc.) do not apply as no real data subjects exist in synthetic output.

Fingerprint Extraction Mode

  • Right of access (Art. 15): Fingerprints contain only aggregate statistics; individual records cannot be extracted
  • Right to erasure (Art. 17): Delete source data and fingerprint files; regenerate synthetic data with new parameters
  • Right to restriction (Art. 18): Suspend fingerprint extraction pipeline
  • Right to object (Art. 21): Remove individual from source dataset before extraction

International Transfers

  • Synthetic output: Generally not subject to Chapter V transfer restrictions (no personal data)
  • Fingerprint files: Assess whether aggregate statistics constitute personal data in your jurisdiction
  • Source data: Standard GDPR transfer rules apply during fingerprint extraction

NIST SP 800-226 Alignment

DataSynth’s evaluation framework includes NIST SP 800-226 alignment reporting for synthetic data privacy assessment. Enable via:

privacy:
  nist_alignment_enabled: true

See Also