Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ISO 27001:2022 Alignment

This document maps DataSynth’s technical controls to the ISO/IEC 27001:2022 Annex A controls. DataSynth is a synthetic data generation tool, not a managed service, so this alignment focuses on controls that are directly addressable by the software. Organizational controls (A.5.1 through A.5.37), people controls (A.6), and physical controls (A.7) are primarily the responsibility of the deploying organization and are noted where DataSynth provides supporting capabilities.

Assessment Scope

  • System: DataSynth synthetic financial data generator
  • Version: 0.5.x
  • Standard: ISO/IEC 27001:2022 (Annex A controls from ISO/IEC 27002:2022)
  • Assessment Type: Self-assessment of technical control alignment

A.5 Organizational Controls

A.5.1 Policies for Information Security

DataSynth supports policy-as-code through its configuration management approach:

  • Configuration-as-code: All generation parameters are defined in version-controllable YAML files with typed schema validation. Invalid configurations are rejected before generation begins.
  • Industry presets: Pre-validated configurations for retail, manufacturing, financial services, healthcare, and technology industries reduce misconfiguration risk.
  • CLAUDE.md: The project’s development guidelines are codified and version-controlled alongside the source code, establishing security-relevant coding standards (#[deny(clippy::unwrap_used)], input validation requirements).

Organizations should supplement these technical controls with written information security policies governing DataSynth deployment, access, and data handling.

A.5.12 Classification of Information

DataSynth classifies all generated output as synthetic through the content marking system:

  • Embedded credentials: CSV headers, JSON metadata objects, and Parquet file metadata contain machine-readable ContentCredential records identifying the content as synthetic.
  • Human-readable declarations: Each credential includes a declaration field: “This content is synthetically generated and does not represent real transactions or entities.”
  • Configuration hash: SHA-256 hash of the generation configuration is embedded in output, enabling traceability from any output file back to its generation parameters.
  • Sidecar files: Optional .synthetic-credential.json sidecar files provide classification metadata alongside each output file.

A.5.23 Information Security for Use of Cloud Services

DataSynth supports cloud deployment through:

  • Kubernetes support: Helm charts and deployment manifests for containerized deployment with health (/health), readiness (/ready), and liveness (/live) probe endpoints.
  • Stateless server: The server component maintains no persistent state beyond in-memory generation jobs. Configuration and output are externalized, supporting cloud-native architectures.
  • TLS termination: Integration with Kubernetes ingress controllers, nginx, Caddy, and Envoy for TLS termination.
  • Secret management: API keys can be injected via environment variables or mounted secrets rather than hardcoded in configuration files.

A.8 Technological Controls

A.8.1 User Endpoint Devices

The CLI binary (datasynth-data) is a stateless executable:

  • No persistent credentials: The CLI does not store API keys, tokens, or session data on disk.
  • No network access required: The CLI operates entirely offline for generation workflows. Network access is only needed when connecting to a remote DataSynth server.
  • Deterministic output: Given the same configuration and seed, the CLI produces identical output, eliminating concerns about endpoint-specific state affecting results.

A.8.5 Secure Authentication

DataSynth implements multiple authentication mechanisms:

API Key Authentication:

  • Keys are hashed with Argon2id (memory-hard, timing-attack resistant) at server startup.
  • Raw keys are discarded after hashing; only PHC-format hashes are retained in memory.
  • Verification iterates all stored hashes without short-circuiting to prevent timing-based key enumeration.
  • A 5-second TTL cache using FNV-1a fast hashing reduces repeated Argon2id computation overhead.

JWT/OIDC Integration (optional jwt feature):

  • RS256 token validation with issuer, audience, and expiration checks.
  • Compatible with Keycloak, Auth0, and Microsoft Entra ID.
  • Claims extraction provides subject, email, roles, and tenant ID for downstream RBAC and audit.

Authentication Bypass:

  • Infrastructure endpoints (/health, /ready, /live, /metrics) are exempt from authentication to support load balancer and orchestrator probes.

A.8.9 Configuration Management

DataSynth enforces configuration integrity through:

  • Typed schema validation: YAML configuration is deserialized into strongly-typed Rust structs. Type mismatches, missing required fields, and constraint violations (e.g., rates outside 0.0–1.0, non-ascending approval thresholds) produce descriptive error messages before generation begins.
  • Complexity presets: Small (~100 accounts), medium (~400), and large (~2500) complexity levels provide pre-validated scaling parameters.
  • Template system: YAML/JSON templates with merge strategies enable configuration reuse while maintaining a single source of truth for shared settings.
  • Configuration hashing: SHA-256 hash of the resolved configuration is computed before generation and embedded in all output metadata, enabling drift detection.

A.8.12 Data Leakage Prevention

DataSynth’s architecture inherently prevents data leakage:

  • Synthetic-only generation: The default workflow generates data from statistical distributions and configuration parameters. No real data enters the pipeline.
  • Content marking: All output files carry machine-readable synthetic content credentials (EU AI Act Article 50). Third-party systems can detect and flag synthetic content programmatically.
  • Fingerprint privacy: When real data is used as input for fingerprint extraction, differential privacy (Laplace mechanism, configurable epsilon/delta) and k-anonymity suppress individual-level information. The resulting .dsf file contains only aggregate statistics.
  • Quality gate enforcement: The PrivacyMiaAuc quality gate validates that generated data does not memorize real data patterns (MIA AUC-ROC threshold).

A.8.16 Monitoring Activities

DataSynth provides monitoring at multiple layers:

Structured Audit Logging: The JsonAuditLogger emits structured JSON events via the tracing crate, recording:

  • Timestamp (UTC), request ID, actor identity
  • Action attempted, resource accessed, outcome (success/denied/error)
  • Tenant ID, source IP, user agent

Events are emitted at INFO level with a dedicated audit_event structured field for log aggregation filtering.

Resource Monitoring:

  • Memory guard reads /proc/self/statm (Linux) or ps (macOS) for resident set size tracking.
  • Disk guard uses statvfs (Unix) / GetDiskFreeSpaceExW (Windows) for available space monitoring.
  • CPU monitor tracks utilization with auto-throttle at 0.95 threshold.
  • The DegradationController combines all monitors and emits level-change events when resource pressure triggers degradation.

Generation Monitoring:

  • Run manifests capture configuration hash, seed, crate versions, start/end times, record counts, and quality gate results.
  • Prometheus-compatible /metrics endpoint exposes runtime statistics.

A.8.24 Use of Cryptography

DataSynth uses cryptographic primitives for the following purposes:

PurposeAlgorithmImplementation
Deterministic RNGChaCha8 (CSPRNG)rand_chacha crate, configurable seed
API key hashingArgon2idargon2 crate, random salt, PHC format
Configuration integritySHA-256Config hash embedded in output metadata
JWT verificationRS256 (RSA + SHA-256)jsonwebtoken crate (optional jwt feature)
UUID generationFNV-1a hashDeterministic collision-free UUIDs with generator-type discriminators

Cryptographic operations use well-maintained Rust crate implementations. No custom cryptographic algorithms are implemented.

A.8.25 Secure Development Lifecycle

DataSynth’s development process includes:

  • Static analysis: cargo clippy with #[deny(clippy::unwrap_used)] enforces safe error handling across the codebase.
  • Test coverage: 2,500+ tests across 15 crates covering unit, integration, and property-based scenarios.
  • Dependency auditing: cargo audit checks for known vulnerabilities in dependencies.
  • Type safety: Rust’s ownership model and type system eliminate entire classes of memory safety and concurrency bugs at compile time.
  • MSRV policy: Minimum Supported Rust Version (1.88) ensures builds use a recent, well-supported compiler.
  • CI/CD: Automated build, test, lint, and audit checks on every commit.

A.8.28 Secure Coding

DataSynth applies secure coding practices:

  • No unwrap() in library code: #[deny(clippy::unwrap_used)] prevents panics from unchecked error handling.
  • Input validation: All user-provided configuration values are validated against typed schemas with range constraints before use.
  • Precise decimal arithmetic: Financial amounts use rust_decimal (serialized as strings) instead of IEEE 754 floating point, preventing rounding errors in financial calculations.
  • No unsafe code: The codebase does not use unsafe blocks in application logic.
  • Timing-safe comparisons: API key verification uses constant-time Argon2id comparison (iterating all hashes) to prevent side-channel attacks.
  • Memory-safe concurrency: Rust’s ownership model prevents data races at compile time. Shared state uses Arc<Mutex<>> or atomic operations.

Statement of Applicability

The following table summarizes the applicability of ISO 27001:2022 Annex A controls to DataSynth.

Implemented Controls

ControlTitleImplementation
A.5.1Information security policiesConfiguration-as-code with schema validation
A.5.12Classification of informationSynthetic content marking (EU AI Act Article 50)
A.5.23Cloud service securityKubernetes deployment, health probes, TLS support
A.8.1User endpoint devicesStateless CLI with no persistent credentials
A.8.5Secure authenticationArgon2id API keys, JWT/OIDC, RBAC
A.8.9Configuration managementTyped schema validation, presets, hashing
A.8.12Data leakage preventionSynthetic-only generation, content marking, fingerprint privacy
A.8.16Monitoring activitiesStructured audit logs, resource monitors, run manifests
A.8.24Use of cryptographyChaCha8 RNG, Argon2id, SHA-256, RS256 JWT
A.8.25Secure development lifecycleClippy, 2,500+ tests, cargo audit, CI/CD
A.8.28Secure codingNo unwrap, input validation, precise decimals, no unsafe

Partially Implemented Controls

ControlTitleStatusGap
A.5.8Information security in project managementPartialSecurity considerations are embedded in code (schema validation, quality gates) but formal project management security procedures are organizational
A.5.14Information transferPartialTLS support for server API; file-based output transfer policies are organizational
A.5.29Information security during disruptionPartialGraceful degradation handles resource pressure; broader business continuity is organizational
A.8.8Management of technical vulnerabilitiesPartialcargo audit scans dependencies; patch management cadence is organizational
A.8.15LoggingPartialStructured JSON audit events with correlation IDs; log retention and SIEM integration are organizational
A.8.26Application security requirementsPartialInput validation and schema enforcement are built-in; threat modeling documentation is organizational

Not Applicable Controls

ControlTitleRationale
A.5.19Information security in supplier relationshipsDataSynth is open-source software; supplier controls apply to the deploying organization
A.5.30ICT readiness for business continuityBusiness continuity planning is an organizational responsibility
A.6.1–A.6.8People controlsPersonnel security controls are organizational
A.7.1–A.7.14Physical controlsPhysical security controls depend on deployment environment
A.8.2Privileged access rightsOS-level privilege management is outside DataSynth’s scope
A.8.7Protection against malwareEndpoint protection is an infrastructure concern
A.8.20Networks securityNetwork segmentation and firewall rules are infrastructure concerns
A.8.23Web filteringWeb filtering is an organizational network control

Continuous Improvement

DataSynth supports ISO 27001’s Plan-Do-Check-Act cycle through:

  • Plan: Configuration-as-code with schema validation enforces security requirements at design time.
  • Do: Automated quality gates and resource guards enforce controls during operation.
  • Check: Evaluation framework produces quantitative metrics (Benford MAD, balance coherence, MIA AUC-ROC) that can be trended over time.
  • Act: The AutoTuner in datasynth-eval generates configuration patches from evaluation gaps, creating a feedback loop for continuous improvement.

See Also