Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Interconnectivity and Relationship Modeling

SyntheticData provides comprehensive relationship modeling capabilities for generating realistic enterprise networks with multi-tier vendor relationships, customer segmentation, relationship strength calculations, and cross-process linkages.

Overview

Real enterprise data exhibits complex interconnections between entities:

  • Vendors form multi-tier supply chains (supplier-of-supplier)
  • Customers segment by value (Enterprise vs. SMB) with different behaviors
  • Relationships vary in strength based on transaction history
  • Business processes connect (P2P and O2C link through inventory)

SyntheticData models all of these patterns to produce realistic, interconnected data.


Multi-Tier Vendor Networks

Supply Chain Tiers

Vendors are organized into a supply chain hierarchy:

TierDescriptionVisibilityTypical Count
Tier 1Direct suppliersFull financial visibility50-100 per company
Tier 2Supplier’s suppliersPartial visibility4-10 per Tier 1
Tier 3Deep supply chainMinimal visibility2-5 per Tier 2

Vendor Clusters

Vendors are classified into behavioral clusters:

ClusterShareCharacteristics
ReliableStrategic20%High delivery scores, low invoice errors, consistent quality
StandardOperational50%Average performance, predictable patterns
Transactional25%One-off or occasional purchases
Problematic5%Quality issues, late deliveries, invoice discrepancies

Vendor Lifecycle Stages

Onboarding → RampUp → SteadyState → Decline → Terminated

Each stage has associated behaviors:

  • Onboarding: Initial qualification, small orders
  • RampUp: Increasing order volumes
  • SteadyState: Stable, predictable patterns
  • Decline: Reduced orders, performance issues
  • Terminated: Relationship ended

Vendor Quality Scores

MetricRangeDescription
delivery_on_time0.0-1.0Percentage of on-time deliveries
quality_pass_rate0.0-1.0Quality inspection pass rate
invoice_accuracy0.0-1.0Invoice matching accuracy
responsiveness_score0.0-1.0Communication responsiveness

Vendor Concentration Analysis

SyntheticData tracks vendor concentration risks:

dependencies:
  max_single_vendor_concentration: 0.15  # No vendor > 15% of spend
  top_5_concentration: 0.45              # Top 5 vendors < 45% of spend
  single_source_percent: 0.05            # 5% of materials single-sourced

Customer Value Segmentation

Value Segments

Customers follow a Pareto-like distribution:

SegmentRevenue ShareCustomer ShareTypical Order Value
Enterprise40%5%$50,000+
MidMarket35%20%$5,000-$50,000
SMB20%50%$500-$5,000
Consumer5%25%$50-$500

Customer Lifecycle

Prospect → New → Growth → Mature → AtRisk → Churned
                                         ↓
                                      WonBack

Each stage has associated behaviors:

  • Prospect: Potential customer, conversion probability
  • New: First purchase within 90 days
  • Growth: Increasing order frequency/value
  • Mature: Stable, loyal customer
  • AtRisk: Declining activity, churn signals
  • Churned: No activity for extended period
  • WonBack: Previously churned, now returned

Customer Engagement Metrics

MetricDescription
order_frequencyAverage orders per period
recency_daysDays since last order
nps_scoreNet Promoter Score (-100 to +100)
engagement_scoreComposite engagement metric (0.0-1.0)

Customer Networks

  • Referral Networks: Customers refer other customers (configurable rate)
  • Corporate Hierarchies: Parent/child company relationships
  • Industry Clusters: Customers grouped by industry vertical

Relationship Strength Modeling

Composite Strength Calculation

Relationship strength is computed from multiple factors:

ComponentWeightScaleDescription
Transaction Volume30%LogarithmicTotal monetary value
Transaction Count25%Square rootNumber of transactions
Duration20%LinearRelationship age in days
Recency15%Exponential decayDays since last transaction
Mutual Connections10%Jaccard indexShared network connections

Strength Categories

StrengthThresholdDescription
Strong≥ 0.7Core business relationship
Moderate≥ 0.4Regular, established relationship
Weak≥ 0.1Occasional relationship
Dormant< 0.1Inactive relationship

Recency Decay

The recency component uses exponential decay:

recency_score = exp(-days_since_last / half_life)

Default half-life is 90 days.


Cross-Process Linkages

Inventory naturally connects Procure-to-Pay and Order-to-Cash:

P2P: Purchase Order → Goods Receipt → Vendor Invoice → Payment
                           ↓
                      [Inventory]
                           ↓
O2C: Sales Order → Delivery → Customer Invoice → Receipt

When enabled, SyntheticData generates explicit CrossProcessLink records connecting:

  • GoodsReceipt (P2P) to Delivery (O2C) via inventory item

Payment-Bank Reconciliation

Links payment transactions to bank statement entries for reconciliation.

Ensures intercompany transactions are properly linked between sending and receiving entities.


Entity Graph

Graph Structure

The EntityGraph provides a unified view of all entity relationships:

ComponentDescription
NodesEntities with type, ID, and metadata
EdgesRelationships with type and strength
IndexesFast lookups by entity type and ID

Entity Types (16 types)

Company, Vendor, Customer, Employee, Department, CostCenter,
Project, Contract, Asset, BankAccount, Material, Product,
Location, Currency, Account, Entity

Relationship Types (26 types)

// Transactional
BuysFrom, SellsTo, PaysTo, ReceivesFrom, SuppliesTo, OrdersFrom

// Organizational
ReportsTo, Manages, BelongsTo, OwnedBy, PartOf, Contains

// Network
ReferredBy, PartnersWith, AffiliateOf, SubsidiaryOf

// Process
ApprovesFor, AuthorizesFor, ProcessesFor

// Financial
BillsTo, ShipsTo, CollectsFrom, RemitsTo

// Document
ReferencedBy, SupersededBy, AmendedBy, LinkedTo

Configuration

Complete Example

vendor_network:
  enabled: true
  depth: 3
  tiers:
    tier1:
      count_min: 50
      count_max: 100
    tier2:
      count_per_parent_min: 4
      count_per_parent_max: 10
    tier3:
      count_per_parent_min: 2
      count_per_parent_max: 5
  clusters:
    reliable_strategic: 0.20
    standard_operational: 0.50
    transactional: 0.25
    problematic: 0.05
  dependencies:
    max_single_vendor_concentration: 0.15
    top_5_concentration: 0.45
    single_source_percent: 0.05

customer_segmentation:
  enabled: true
  value_segments:
    enterprise:
      revenue_share: 0.40
      customer_share: 0.05
      avg_order_min: 50000.0
    mid_market:
      revenue_share: 0.35
      customer_share: 0.20
      avg_order_min: 5000.0
      avg_order_max: 50000.0
    smb:
      revenue_share: 0.20
      customer_share: 0.50
      avg_order_min: 500.0
      avg_order_max: 5000.0
    consumer:
      revenue_share: 0.05
      customer_share: 0.25
      avg_order_min: 50.0
      avg_order_max: 500.0
  lifecycle:
    prospect_rate: 0.10
    new_rate: 0.15
    growth_rate: 0.20
    mature_rate: 0.35
    at_risk_rate: 0.10
    churned_rate: 0.08
    won_back_rate: 0.02
  networks:
    referrals:
      enabled: true
      referral_rate: 0.15
    corporate_hierarchies:
      enabled: true
      hierarchy_probability: 0.30

relationship_strength:
  enabled: true
  calculation:
    transaction_volume_weight: 0.30
    transaction_count_weight: 0.25
    relationship_duration_weight: 0.20
    recency_weight: 0.15
    mutual_connections_weight: 0.10
    recency_half_life_days: 90
  thresholds:
    strong: 0.7
    moderate: 0.4
    weak: 0.1

cross_process_links:
  enabled: true
  inventory_p2p_o2c: true
  payment_bank_reconciliation: true
  intercompany_bilateral: true

Network Evaluation

SyntheticData includes network metrics evaluation:

MetricDescriptionTypical Range
ConnectivityLargest connected component ratio> 0.95
Power Law AlphaDegree distribution exponent2.0-3.0
Clustering CoefficientLocal clustering0.10-0.50
Top-1 ConcentrationLargest node share< 0.15
Top-5 ConcentrationTop 5 nodes share< 0.45
HHIHerfindahl-Hirschman Index< 0.25

These metrics validate that generated networks exhibit realistic properties.


API Usage

Rust API

#![allow(unused)]
fn main() {
use datasynth_core::models::{
    VendorNetwork, VendorCluster, SupplyChainTier,
    SegmentedCustomerPool, CustomerValueSegment,
    EntityGraph, RelationshipStrengthCalculator,
};
use datasynth_generators::relationships::EntityGraphGenerator;

// Generate vendor network
let vendor_generator = VendorGenerator::new(config);
let vendor_network = vendor_generator.generate_vendor_network("C001");

// Generate segmented customers
let customer_generator = CustomerGenerator::new(config);
let customer_pool = customer_generator.generate_segmented_pool("C001");

// Build entity graph with cross-process links
let graph_generator = EntityGraphGenerator::with_defaults();
let entity_graph = graph_generator.generate_entity_graph(
    &vendor_network,
    &customer_pool,
    &transactions,
    &document_flows,
);
}

Python API

from datasynth_py import DataSynth
from datasynth_py.config import VendorNetworkConfig, CustomerSegmentationConfig

config = Config(
    vendor_network=VendorNetworkConfig(
        enabled=True,
        depth=3,
        clusters={"reliable_strategic": 0.20, "standard_operational": 0.50},
    ),
    customer_segmentation=CustomerSegmentationConfig(
        enabled=True,
        value_segments={
            "enterprise": {"revenue_share": 0.40, "customer_share": 0.05},
            "mid_market": {"revenue_share": 0.35, "customer_share": 0.20},
        },
    ),
)

result = DataSynth().generate(config=config, output={"format": "csv"})

See Also