Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Architecture

SyntheticData is designed as a modular, high-performance data generation system.

Overview

┌─────────────────────────────────────────────────────────────────────┐
│                         Application Layer                            │
│   datasynth-cli │ datasynth-server │ datasynth-ui                               │
├─────────────────────────────────────────────────────────────────────┤
│                        Orchestration Layer                           │
│                         datasynth-runtime                                │
├─────────────────────────────────────────────────────────────────────┤
│                        Generation Layer                              │
│   datasynth-generators │ datasynth-graph                                    │
├─────────────────────────────────────────────────────────────────────┤
│                        Foundation Layer                              │
│   datasynth-core │ datasynth-config │ datasynth-output                          │
└─────────────────────────────────────────────────────────────────────┘

Key Characteristics

CharacteristicDescription
Modular12 independent crates with clear boundaries
LayeredStrict dependency hierarchy prevents cycles
High-PerformanceParallel execution, memory-efficient streaming
DeterministicSeeded RNG for reproducible output
Type-SafeRust’s type system ensures correctness

Architecture Sections

SectionDescription
Workspace LayoutCrate organization and dependencies
Domain ModelsCore data structures
Data FlowHow data moves through the system
Generation PipelineStep-by-step generation process
Memory ManagementMemory tracking and limits
Design DecisionsKey architectural choices

Design Principles

Separation of Concerns

Each crate has a single responsibility:

  • datasynth-core: Domain models and distributions
  • datasynth-config: Configuration and validation
  • datasynth-generators: Data generation logic
  • datasynth-output: File writing
  • datasynth-runtime: Orchestration

Dependency Inversion

Core components define traits, implementations provided by higher layers:

#![allow(unused)]
fn main() {
// datasynth-core defines the trait
pub trait Generator<T> {
    fn generate_batch(&mut self, count: usize) -> Result<Vec<T>>;
}

// datasynth-generators implements it
impl Generator<JournalEntry> for JournalEntryGenerator {
    fn generate_batch(&mut self, count: usize) -> Result<Vec<JournalEntry>> {
        // Implementation
    }
}
}

Configuration-Driven

All behavior controlled by configuration:

transactions:
  target_count: 100000
  benford:
    enabled: true

Memory Safety

Rust’s ownership system prevents:

  • Data races in parallel generation
  • Memory leaks
  • Buffer overflows

Component Interactions

                    ┌─────────────┐
                    │   Config    │
                    └──────┬──────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
        ▼                  ▼                  ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│  JE Generator│  │ Doc Generator│  │ Master Data  │
└──────┬───────┘  └──────┬───────┘  └──────┬───────┘
       │                 │                 │
       └─────────────────┼─────────────────┘
                         │
                         ▼
                ┌──────────────┐
                │ Orchestrator │
                └──────┬───────┘
                       │
        ┌──────────────┼──────────────┐
        │              │              │
        ▼              ▼              ▼
   ┌─────────┐   ┌─────────┐   ┌─────────┐
   │   CSV   │   │  Graph  │   │  JSON   │
   └─────────┘   └─────────┘   └─────────┘

Performance Architecture

Parallel Execution

#![allow(unused)]
fn main() {
// Thread pool distributes work
let entries: Vec<JournalEntry> = (0..num_threads)
    .into_par_iter()
    .flat_map(|thread_id| {
        let mut gen = generator_for_thread(thread_id);
        gen.generate_batch(batch_size)
    })
    .collect();
}

Streaming Output

#![allow(unused)]
fn main() {
// Memory-efficient streaming
for entry in generator.generate_stream() {
    sink.write(&entry)?;
}
}

Memory Guards

#![allow(unused)]
fn main() {
// Memory limits enforced
let guard = MemoryGuard::new(config);
while !guard.check().exceeds_hard_limit {
    generate_batch();
}
}

Extension Points

Custom Generators

Implement the Generator trait:

#![allow(unused)]
fn main() {
impl Generator<CustomType> for CustomGenerator {
    fn generate_batch(&mut self, count: usize) -> Result<Vec<CustomType>> {
        // Custom logic
    }
}
}

Custom Output Sinks

Implement the Sink trait:

#![allow(unused)]
fn main() {
impl Sink<JournalEntry> for CustomSink {
    fn write(&mut self, entry: &JournalEntry) -> Result<()> {
        // Custom output logic
    }
}
}

Custom Distributions

Create specialized samplers:

#![allow(unused)]
fn main() {
impl AmountSampler for CustomAmountSampler {
    fn sample(&mut self) -> Decimal {
        // Custom distribution
    }
}
}

See Also