ISO 27001:2022 Alignment
This document maps DataSynth’s technical controls to the ISO/IEC 27001:2022 Annex A controls. DataSynth is a synthetic data generation tool, not a managed service, so this alignment focuses on controls that are directly addressable by the software. Organizational controls (A.5.1 through A.5.37), people controls (A.6), and physical controls (A.7) are primarily the responsibility of the deploying organization and are noted where DataSynth provides supporting capabilities.
Assessment Scope
- System: DataSynth synthetic financial data generator
- Version: 0.5.x
- Standard: ISO/IEC 27001:2022 (Annex A controls from ISO/IEC 27002:2022)
- Assessment Type: Self-assessment of technical control alignment
A.5 Organizational Controls
A.5.1 Policies for Information Security
DataSynth supports policy-as-code through its configuration management approach:
- Configuration-as-code: All generation parameters are defined in version-controllable YAML files with typed schema validation. Invalid configurations are rejected before generation begins.
- Industry presets: Pre-validated configurations for retail, manufacturing, financial services, healthcare, and technology industries reduce misconfiguration risk.
- CLAUDE.md: The project’s development guidelines are codified and version-controlled alongside the source code, establishing security-relevant coding standards (
#[deny(clippy::unwrap_used)], input validation requirements).
Organizations should supplement these technical controls with written information security policies governing DataSynth deployment, access, and data handling.
A.5.12 Classification of Information
DataSynth classifies all generated output as synthetic through the content marking system:
- Embedded credentials: CSV headers, JSON metadata objects, and Parquet file metadata contain machine-readable
ContentCredentialrecords identifying the content as synthetic. - Human-readable declarations: Each credential includes a
declarationfield: “This content is synthetically generated and does not represent real transactions or entities.” - Configuration hash: SHA-256 hash of the generation configuration is embedded in output, enabling traceability from any output file back to its generation parameters.
- Sidecar files: Optional
.synthetic-credential.jsonsidecar files provide classification metadata alongside each output file.
A.5.23 Information Security for Use of Cloud Services
DataSynth supports cloud deployment through:
- Kubernetes support: Helm charts and deployment manifests for containerized deployment with health (
/health), readiness (/ready), and liveness (/live) probe endpoints. - Stateless server: The server component maintains no persistent state beyond in-memory generation jobs. Configuration and output are externalized, supporting cloud-native architectures.
- TLS termination: Integration with Kubernetes ingress controllers, nginx, Caddy, and Envoy for TLS termination.
- Secret management: API keys can be injected via environment variables or mounted secrets rather than hardcoded in configuration files.
A.8 Technological Controls
A.8.1 User Endpoint Devices
The CLI binary (datasynth-data) is a stateless executable:
- No persistent credentials: The CLI does not store API keys, tokens, or session data on disk.
- No network access required: The CLI operates entirely offline for generation workflows. Network access is only needed when connecting to a remote DataSynth server.
- Deterministic output: Given the same configuration and seed, the CLI produces identical output, eliminating concerns about endpoint-specific state affecting results.
A.8.5 Secure Authentication
DataSynth implements multiple authentication mechanisms:
API Key Authentication:
- Keys are hashed with Argon2id (memory-hard, timing-attack resistant) at server startup.
- Raw keys are discarded after hashing; only PHC-format hashes are retained in memory.
- Verification iterates all stored hashes without short-circuiting to prevent timing-based key enumeration.
- A 5-second TTL cache using FNV-1a fast hashing reduces repeated Argon2id computation overhead.
JWT/OIDC Integration (optional jwt feature):
- RS256 token validation with issuer, audience, and expiration checks.
- Compatible with Keycloak, Auth0, and Microsoft Entra ID.
- Claims extraction provides subject, email, roles, and tenant ID for downstream RBAC and audit.
Authentication Bypass:
- Infrastructure endpoints (
/health,/ready,/live,/metrics) are exempt from authentication to support load balancer and orchestrator probes.
A.8.9 Configuration Management
DataSynth enforces configuration integrity through:
- Typed schema validation: YAML configuration is deserialized into strongly-typed Rust structs. Type mismatches, missing required fields, and constraint violations (e.g., rates outside 0.0–1.0, non-ascending approval thresholds) produce descriptive error messages before generation begins.
- Complexity presets: Small (~100 accounts), medium (~400), and large (~2500) complexity levels provide pre-validated scaling parameters.
- Template system: YAML/JSON templates with merge strategies enable configuration reuse while maintaining a single source of truth for shared settings.
- Configuration hashing: SHA-256 hash of the resolved configuration is computed before generation and embedded in all output metadata, enabling drift detection.
A.8.12 Data Leakage Prevention
DataSynth’s architecture inherently prevents data leakage:
- Synthetic-only generation: The default workflow generates data from statistical distributions and configuration parameters. No real data enters the pipeline.
- Content marking: All output files carry machine-readable synthetic content credentials (EU AI Act Article 50). Third-party systems can detect and flag synthetic content programmatically.
- Fingerprint privacy: When real data is used as input for fingerprint extraction, differential privacy (Laplace mechanism, configurable epsilon/delta) and k-anonymity suppress individual-level information. The resulting
.dsffile contains only aggregate statistics. - Quality gate enforcement: The
PrivacyMiaAucquality gate validates that generated data does not memorize real data patterns (MIA AUC-ROC threshold).
A.8.16 Monitoring Activities
DataSynth provides monitoring at multiple layers:
Structured Audit Logging:
The JsonAuditLogger emits structured JSON events via the tracing crate, recording:
- Timestamp (UTC), request ID, actor identity
- Action attempted, resource accessed, outcome (success/denied/error)
- Tenant ID, source IP, user agent
Events are emitted at INFO level with a dedicated audit_event structured field for log aggregation filtering.
Resource Monitoring:
- Memory guard reads
/proc/self/statm(Linux) orps(macOS) for resident set size tracking. - Disk guard uses
statvfs(Unix) /GetDiskFreeSpaceExW(Windows) for available space monitoring. - CPU monitor tracks utilization with auto-throttle at 0.95 threshold.
- The
DegradationControllercombines all monitors and emits level-change events when resource pressure triggers degradation.
Generation Monitoring:
- Run manifests capture configuration hash, seed, crate versions, start/end times, record counts, and quality gate results.
- Prometheus-compatible
/metricsendpoint exposes runtime statistics.
A.8.24 Use of Cryptography
DataSynth uses cryptographic primitives for the following purposes:
| Purpose | Algorithm | Implementation |
|---|---|---|
| Deterministic RNG | ChaCha8 (CSPRNG) | rand_chacha crate, configurable seed |
| API key hashing | Argon2id | argon2 crate, random salt, PHC format |
| Configuration integrity | SHA-256 | Config hash embedded in output metadata |
| JWT verification | RS256 (RSA + SHA-256) | jsonwebtoken crate (optional jwt feature) |
| UUID generation | FNV-1a hash | Deterministic collision-free UUIDs with generator-type discriminators |
Cryptographic operations use well-maintained Rust crate implementations. No custom cryptographic algorithms are implemented.
A.8.25 Secure Development Lifecycle
DataSynth’s development process includes:
- Static analysis:
cargo clippywith#[deny(clippy::unwrap_used)]enforces safe error handling across the codebase. - Test coverage: 2,500+ tests across 15 crates covering unit, integration, and property-based scenarios.
- Dependency auditing:
cargo auditchecks for known vulnerabilities in dependencies. - Type safety: Rust’s ownership model and type system eliminate entire classes of memory safety and concurrency bugs at compile time.
- MSRV policy: Minimum Supported Rust Version (1.88) ensures builds use a recent, well-supported compiler.
- CI/CD: Automated build, test, lint, and audit checks on every commit.
A.8.28 Secure Coding
DataSynth applies secure coding practices:
- No
unwrap()in library code:#[deny(clippy::unwrap_used)]prevents panics from unchecked error handling. - Input validation: All user-provided configuration values are validated against typed schemas with range constraints before use.
- Precise decimal arithmetic: Financial amounts use
rust_decimal(serialized as strings) instead of IEEE 754 floating point, preventing rounding errors in financial calculations. - No unsafe code: The codebase does not use
unsafeblocks in application logic. - Timing-safe comparisons: API key verification uses constant-time Argon2id comparison (iterating all hashes) to prevent side-channel attacks.
- Memory-safe concurrency: Rust’s ownership model prevents data races at compile time. Shared state uses
Arc<Mutex<>>or atomic operations.
Statement of Applicability
The following table summarizes the applicability of ISO 27001:2022 Annex A controls to DataSynth.
Implemented Controls
| Control | Title | Implementation |
|---|---|---|
| A.5.1 | Information security policies | Configuration-as-code with schema validation |
| A.5.12 | Classification of information | Synthetic content marking (EU AI Act Article 50) |
| A.5.23 | Cloud service security | Kubernetes deployment, health probes, TLS support |
| A.8.1 | User endpoint devices | Stateless CLI with no persistent credentials |
| A.8.5 | Secure authentication | Argon2id API keys, JWT/OIDC, RBAC |
| A.8.9 | Configuration management | Typed schema validation, presets, hashing |
| A.8.12 | Data leakage prevention | Synthetic-only generation, content marking, fingerprint privacy |
| A.8.16 | Monitoring activities | Structured audit logs, resource monitors, run manifests |
| A.8.24 | Use of cryptography | ChaCha8 RNG, Argon2id, SHA-256, RS256 JWT |
| A.8.25 | Secure development lifecycle | Clippy, 2,500+ tests, cargo audit, CI/CD |
| A.8.28 | Secure coding | No unwrap, input validation, precise decimals, no unsafe |
Partially Implemented Controls
| Control | Title | Status | Gap |
|---|---|---|---|
| A.5.8 | Information security in project management | Partial | Security considerations are embedded in code (schema validation, quality gates) but formal project management security procedures are organizational |
| A.5.14 | Information transfer | Partial | TLS support for server API; file-based output transfer policies are organizational |
| A.5.29 | Information security during disruption | Partial | Graceful degradation handles resource pressure; broader business continuity is organizational |
| A.8.8 | Management of technical vulnerabilities | Partial | cargo audit scans dependencies; patch management cadence is organizational |
| A.8.15 | Logging | Partial | Structured JSON audit events with correlation IDs; log retention and SIEM integration are organizational |
| A.8.26 | Application security requirements | Partial | Input validation and schema enforcement are built-in; threat modeling documentation is organizational |
Not Applicable Controls
| Control | Title | Rationale |
|---|---|---|
| A.5.19 | Information security in supplier relationships | DataSynth is open-source software; supplier controls apply to the deploying organization |
| A.5.30 | ICT readiness for business continuity | Business continuity planning is an organizational responsibility |
| A.6.1–A.6.8 | People controls | Personnel security controls are organizational |
| A.7.1–A.7.14 | Physical controls | Physical security controls depend on deployment environment |
| A.8.2 | Privileged access rights | OS-level privilege management is outside DataSynth’s scope |
| A.8.7 | Protection against malware | Endpoint protection is an infrastructure concern |
| A.8.20 | Networks security | Network segmentation and firewall rules are infrastructure concerns |
| A.8.23 | Web filtering | Web filtering is an organizational network control |
Continuous Improvement
DataSynth supports ISO 27001’s Plan-Do-Check-Act cycle through:
- Plan: Configuration-as-code with schema validation enforces security requirements at design time.
- Do: Automated quality gates and resource guards enforce controls during operation.
- Check: Evaluation framework produces quantitative metrics (Benford MAD, balance coherence, MIA AUC-ROC) that can be trended over time.
- Act: The AutoTuner in
datasynth-evalgenerates configuration patches from evaluation gaps, creating a feedback loop for continuous improvement.