ISO 27001:2022 Alignment

This document maps DataSynth’s technical controls to the ISO/IEC 27001:2022 Annex A controls. DataSynth is a synthetic data generation tool, not a managed service, so this alignment focuses on controls that are directly addressable by the software. Organizational controls (A.5.1 through A.5.37), people controls (A.6), and physical controls (A.7) are primarily the responsibility of the deploying organization and are noted where DataSynth provides supporting capabilities.

Assessment Scope

System: DataSynth synthetic financial data generator
Version: 0.5.x
Standard: ISO/IEC 27001:2022 (Annex A controls from ISO/IEC 27002:2022)
Assessment Type: Self-assessment of technical control alignment

A.5 Organizational Controls

A.5.1 Policies for Information Security

DataSynth supports policy-as-code through its configuration management approach:

Configuration-as-code: All generation parameters are defined in version-controllable YAML files with typed schema validation. Invalid configurations are rejected before generation begins.
Industry presets: Pre-validated configurations for retail, manufacturing, financial services, healthcare, and technology industries reduce misconfiguration risk.
CLAUDE.md: The project’s development guidelines are codified and version-controlled alongside the source code, establishing security-relevant coding standards (#[deny(clippy::unwrap_used)], input validation requirements).

Organizations should supplement these technical controls with written information security policies governing DataSynth deployment, access, and data handling.

A.5.12 Classification of Information

DataSynth classifies all generated output as synthetic through the content marking system:

Embedded credentials: CSV headers, JSON metadata objects, and Parquet file metadata contain machine-readable ContentCredential records identifying the content as synthetic.
Human-readable declarations: Each credential includes a declaration field: “This content is synthetically generated and does not represent real transactions or entities.”
Configuration hash: SHA-256 hash of the generation configuration is embedded in output, enabling traceability from any output file back to its generation parameters.
Sidecar files: Optional .synthetic-credential.json sidecar files provide classification metadata alongside each output file.

A.5.23 Information Security for Use of Cloud Services

DataSynth supports cloud deployment through:

Kubernetes support: Helm charts and deployment manifests for containerized deployment with health (/health), readiness (/ready), and liveness (/live) probe endpoints.
Stateless server: The server component maintains no persistent state beyond in-memory generation jobs. Configuration and output are externalized, supporting cloud-native architectures.
TLS termination: Integration with Kubernetes ingress controllers, nginx, Caddy, and Envoy for TLS termination.
Secret management: API keys can be injected via environment variables or mounted secrets rather than hardcoded in configuration files.

A.8 Technological Controls

A.8.1 User Endpoint Devices

The CLI binary (datasynth-data) is a stateless executable:

No persistent credentials: The CLI does not store API keys, tokens, or session data on disk.
No network access required: The CLI operates entirely offline for generation workflows. Network access is only needed when connecting to a remote DataSynth server.
Deterministic output: Given the same configuration and seed, the CLI produces identical output, eliminating concerns about endpoint-specific state affecting results.

A.8.5 Secure Authentication

DataSynth implements multiple authentication mechanisms:

API Key Authentication:

Keys are hashed with Argon2id (memory-hard, timing-attack resistant) at server startup.
Raw keys are discarded after hashing; only PHC-format hashes are retained in memory.
Verification iterates all stored hashes without short-circuiting to prevent timing-based key enumeration.
A 5-second TTL cache using FNV-1a fast hashing reduces repeated Argon2id computation overhead.

JWT/OIDC Integration (optional jwt feature):

RS256 token validation with issuer, audience, and expiration checks.
Compatible with Keycloak, Auth0, and Microsoft Entra ID.
Claims extraction provides subject, email, roles, and tenant ID for downstream RBAC and audit.

Authentication Bypass:

Infrastructure endpoints (/health, /ready, /live, /metrics) are exempt from authentication to support load balancer and orchestrator probes.

A.8.9 Configuration Management

DataSynth enforces configuration integrity through:

Typed schema validation: YAML configuration is deserialized into strongly-typed Rust structs. Type mismatches, missing required fields, and constraint violations (e.g., rates outside 0.0–1.0, non-ascending approval thresholds) produce descriptive error messages before generation begins.
Complexity presets: Small (~100 accounts), medium (~400), and large (~2500) complexity levels provide pre-validated scaling parameters.
Template system: YAML/JSON templates with merge strategies enable configuration reuse while maintaining a single source of truth for shared settings.
Configuration hashing: SHA-256 hash of the resolved configuration is computed before generation and embedded in all output metadata, enabling drift detection.

A.8.12 Data Leakage Prevention

DataSynth’s architecture inherently prevents data leakage:

Synthetic-only generation: The default workflow generates data from statistical distributions and configuration parameters. No real data enters the pipeline.
Content marking: All output files carry machine-readable synthetic content credentials (EU AI Act Article 50). Third-party systems can detect and flag synthetic content programmatically.
Fingerprint privacy: When real data is used as input for fingerprint extraction, differential privacy (Laplace mechanism, configurable epsilon/delta) and k-anonymity suppress individual-level information. The resulting .dsf file contains only aggregate statistics.
Quality gate enforcement: The PrivacyMiaAuc quality gate validates that generated data does not memorize real data patterns (MIA AUC-ROC threshold).

A.8.16 Monitoring Activities

DataSynth provides monitoring at multiple layers:

Structured Audit Logging: The JsonAuditLogger emits structured JSON events via the tracing crate, recording:

Timestamp (UTC), request ID, actor identity
Action attempted, resource accessed, outcome (success/denied/error)
Tenant ID, source IP, user agent

Events are emitted at INFO level with a dedicated audit_event structured field for log aggregation filtering.

Resource Monitoring:

Memory guard reads /proc/self/statm (Linux) or ps (macOS) for resident set size tracking.
Disk guard uses statvfs (Unix) / GetDiskFreeSpaceExW (Windows) for available space monitoring.
CPU monitor tracks utilization with auto-throttle at 0.95 threshold.
The DegradationController combines all monitors and emits level-change events when resource pressure triggers degradation.

Generation Monitoring:

Run manifests capture configuration hash, seed, crate versions, start/end times, record counts, and quality gate results.
Prometheus-compatible /metrics endpoint exposes runtime statistics.

A.8.24 Use of Cryptography

DataSynth uses cryptographic primitives for the following purposes:

Purpose	Algorithm	Implementation
Deterministic RNG	ChaCha8 (CSPRNG)	`rand_chacha` crate, configurable seed
API key hashing	Argon2id	`argon2` crate, random salt, PHC format
Configuration integrity	SHA-256	Config hash embedded in output metadata
JWT verification	RS256 (RSA + SHA-256)	`jsonwebtoken` crate (optional `jwt` feature)
UUID generation	FNV-1a hash	Deterministic collision-free UUIDs with generator-type discriminators

Cryptographic operations use well-maintained Rust crate implementations. No custom cryptographic algorithms are implemented.

A.8.25 Secure Development Lifecycle

DataSynth’s development process includes:

Static analysis: cargo clippy with #[deny(clippy::unwrap_used)] enforces safe error handling across the codebase.
Test coverage: 2,500+ tests across 15 crates covering unit, integration, and property-based scenarios.
Dependency auditing: cargo audit checks for known vulnerabilities in dependencies.
Type safety: Rust’s ownership model and type system eliminate entire classes of memory safety and concurrency bugs at compile time.
MSRV policy: Minimum Supported Rust Version (1.88) ensures builds use a recent, well-supported compiler.
CI/CD: Automated build, test, lint, and audit checks on every commit.

A.8.28 Secure Coding

DataSynth applies secure coding practices:

No unwrap() in library code: #[deny(clippy::unwrap_used)] prevents panics from unchecked error handling.
Input validation: All user-provided configuration values are validated against typed schemas with range constraints before use.
Precise decimal arithmetic: Financial amounts use rust_decimal (serialized as strings) instead of IEEE 754 floating point, preventing rounding errors in financial calculations.
No unsafe code: The codebase does not use unsafe blocks in application logic.
Timing-safe comparisons: API key verification uses constant-time Argon2id comparison (iterating all hashes) to prevent side-channel attacks.
Memory-safe concurrency: Rust’s ownership model prevents data races at compile time. Shared state uses Arc<Mutex<>> or atomic operations.

Statement of Applicability

The following table summarizes the applicability of ISO 27001:2022 Annex A controls to DataSynth.

Implemented Controls

Control	Title	Implementation
A.5.1	Information security policies	Configuration-as-code with schema validation
A.5.12	Classification of information	Synthetic content marking (EU AI Act Article 50)
A.5.23	Cloud service security	Kubernetes deployment, health probes, TLS support
A.8.1	User endpoint devices	Stateless CLI with no persistent credentials
A.8.5	Secure authentication	Argon2id API keys, JWT/OIDC, RBAC
A.8.9	Configuration management	Typed schema validation, presets, hashing
A.8.12	Data leakage prevention	Synthetic-only generation, content marking, fingerprint privacy
A.8.16	Monitoring activities	Structured audit logs, resource monitors, run manifests
A.8.24	Use of cryptography	ChaCha8 RNG, Argon2id, SHA-256, RS256 JWT
A.8.25	Secure development lifecycle	Clippy, 2,500+ tests, cargo audit, CI/CD
A.8.28	Secure coding	No unwrap, input validation, precise decimals, no unsafe

Partially Implemented Controls

Control	Title	Status	Gap
A.5.8	Information security in project management	Partial	Security considerations are embedded in code (schema validation, quality gates) but formal project management security procedures are organizational
A.5.14	Information transfer	Partial	TLS support for server API; file-based output transfer policies are organizational
A.5.29	Information security during disruption	Partial	Graceful degradation handles resource pressure; broader business continuity is organizational
A.8.8	Management of technical vulnerabilities	Partial	`cargo audit` scans dependencies; patch management cadence is organizational
A.8.15	Logging	Partial	Structured JSON audit events with correlation IDs; log retention and SIEM integration are organizational
A.8.26	Application security requirements	Partial	Input validation and schema enforcement are built-in; threat modeling documentation is organizational

Not Applicable Controls

Control	Title	Rationale
A.5.19	Information security in supplier relationships	DataSynth is open-source software; supplier controls apply to the deploying organization
A.5.30	ICT readiness for business continuity	Business continuity planning is an organizational responsibility
A.6.1–A.6.8	People controls	Personnel security controls are organizational
A.7.1–A.7.14	Physical controls	Physical security controls depend on deployment environment
A.8.2	Privileged access rights	OS-level privilege management is outside DataSynth’s scope
A.8.7	Protection against malware	Endpoint protection is an infrastructure concern
A.8.20	Networks security	Network segmentation and firewall rules are infrastructure concerns
A.8.23	Web filtering	Web filtering is an organizational network control

Continuous Improvement

DataSynth supports ISO 27001’s Plan-Do-Check-Act cycle through:

Plan: Configuration-as-code with schema validation enforces security requirements at design time.
Do: Automated quality gates and resource guards enforce controls during operation.
Check: Evaluation framework produces quantitative metrics (Benford MAD, balance coherence, MIA AUC-ROC) that can be trended over time.
Act: The AutoTuner in datasynth-eval generates configuration patches from evaluation gaps, creating a feedback loop for continuous improvement.

SyntheticData Documentation