Data Residency & Retention

Data storage tiers, retention policies, and archival architecture.

The Sankofa Engine uses a multi-tier storage architecture to balance performance, cost, and regulatory compliance. This page documents where data resides, how long it is retained, and how it moves between tiers.

Storage Tiers

TierTechnologyPurposeDefault Retention
HotScyllaDB 6.2Transaction ledger, account stateConfigurable (default 2 years)
ProjectionPostgreSQL 16CQRS read model (balances, query views)Current state (always up-to-date)
ColdS3 / Local filesystemArchived transactionsLong-term (configurable)
Event LogNATS JetStreamTransaction events, signed receipts7 years

Tier Descriptions

Hot Tier (ScyllaDB 6.2) The hot tier stores the active transaction ledger and account state. ScyllaDB provides low-latency reads and writes for real-time transaction processing. Data remains in the hot tier for the configured retention period (default 2 years) before becoming eligible for archival.

Projection Tier (PostgreSQL 16) The projection tier maintains CQRS read models — materialized views of account balances and other derived state. This tier always reflects the current state and is continuously updated as transactions are processed. It does not store historical transactions; it stores the computed result of applying all transactions.

Cold Tier (S3 / Local Filesystem) The cold tier stores archived transactions that have aged out of the hot tier. Cold storage provides cost-effective long-term retention. The storage backend is configurable — S3 for cloud deployments, local filesystem for on-premises deployments.

Event Log (NATS JetStream) The event log retains all transaction events and signed receipts for 7 years by default. It serves as the system of record for event replay, audit reconstruction, and disaster recovery.

Data Classification

The Sankofa Engine classifies data to apply appropriate encryption keys and retention policies:

ClassificationExamplesStorage LocationEncryption
FinancialTransaction amounts, balances, ledger entriesHot, Cold, Event LogAES-GCM-256 (financial DEK)
IdentityAccount identifiers, customer referencesHot, ProjectionAES-GCM-256 (identity DEK)
OperationalHealth metrics, system events, processing logsEvent LogAES-GCM-256 (operational DEK)
AuditHash chain values, signed receiptsHot, Event LogIntegrity-protected (signed)

Data Flow Between Tiers

                    ┌─────────────┐
  Incoming Txn ────▶│  NATS       │  Event published (retained 7 years)
                    │  JetStream  │
                    └──────┬──────┘
                    ┌──────▼──────┐
                    │  Shard      │  Transaction processed
                    │  Worker     │
                    └──┬─────┬───┘
                       │     │
              ┌────────▼┐  ┌─▼──────────┐
              │ ScyllaDB │  │ PostgreSQL │
              │ (Hot)    │  │(Projection)│
              └────┬─────┘  └────────────┘
                   │  After retention period
              ┌────▼─────┐
              │ S3 / FS  │
              │ (Cold)   │
              └──────────┘

Retention Policies

Default Retention Periods

TierDefault RetentionConfigurable
Hot (ScyllaDB)2 yearsYes — per deployment
Projection (PostgreSQL)Indefinite (current state)N/A — always reflects current state
Cold (S3 / Filesystem)IndefiniteYes — per deployment
Event Log (NATS JetStream)7 yearsYes — max_message_age_seconds

Customizing Retention

Retention periods are configured at deployment time. Customers can adjust retention to meet their specific regulatory requirements:

  • Banking regulations may require 5-7 year retention of financial transaction records.
  • Anti-money laundering (AML) requirements may mandate specific retention windows for transaction data.
  • Tax compliance may require retention of financial records for a defined period after the tax year closes.
  • Data minimization regulations (e.g., GDPR) may require shorter retention for certain data categories.

Retention policies are evaluated nightly by the Archival Service.

Archival Process

Archival Service

The Archival Service runs as a Kubernetes CronJob on a nightly schedule:

StepDescription
1. Policy evaluationThe service evaluates retention policies for each shard to identify transactions that have exceeded the hot tier retention period
2. Batch selectionEligible transactions are selected in batches to limit resource consumption
3. Cold storage writeTransactions are written to the cold storage backend (S3 or local filesystem) with their encryption intact
4. Root referenceAn archival root reference is stored in the hot tier, enabling cross-tier queries
5. Hot tier cleanupArchived transactions are removed from ScyllaDB after successful cold storage write is confirmed
6. VerificationThe service verifies that archived data is readable from cold storage before completing the cycle

Cross-Tier Queries

Queries seamlessly bridge hot and cold tiers:

  • The query engine first checks the hot tier (ScyllaDB) for matching records.
  • If the query time range extends beyond the hot tier retention period, the engine follows archival root references to retrieve data from cold storage.
  • Results from both tiers are merged and returned as a unified response.
  • The caller does not need to know which tier the data resides in.

Data Deletion

Purge Capabilities

The Sankofa Engine supports data deletion for regulatory compliance:

MethodDescriptionUse Case
Retention-based expiryData automatically ages out of each tier per configured retention policiesStandard lifecycle management
Cryptographic erasureDestroy the DEK for a data classification — all data encrypted with that key becomes permanently unrecoverableRight-to-erasure requests, decommissioning
Explicit purgeAPI-driven removal of specific records from all tiersTargeted data disposal

Cryptographic Erasure

Cryptographic erasure is the fastest and most complete method of data disposal:

  1. The KMS destroys the DEK for the target data classification or account.
  2. All data encrypted with that DEK becomes immediately and permanently unrecoverable.
  3. The ciphertext may remain in storage but is indistinguishable from random data without the key.
  4. This approach satisfies data disposal requirements without requiring physical media destruction or individual record deletion.

Retention Policy Enforcement

The Archival Service enforces retention policies automatically:

  • Transactions exceeding the hot tier retention period are archived to cold storage.
  • Cold storage data exceeding the cold tier retention period (if configured) is deleted.
  • Event log messages exceeding the NATS JetStream retention period are automatically purged by NATS.
  • All deletion events are logged for audit purposes.

Geographic Considerations

Deployment-Dependent Data Residency

Data residency in the Sankofa Engine is determined by the deployment configuration, not by the application code:

FactorConfiguration
Compute locationKubernetes cluster region and zone selection
Storage locationScyllaDB node placement, S3 bucket region, NATS cluster location
Network boundariesKubernetes network policies and cloud provider VPC configuration
ReplicationScyllaDB replication factor and topology-aware placement

Customer-Configurable Residency

Customers can specify data residency requirements during deployment:

  • Region selection: Deploy the entire stack in a specific geographic region (e.g., US-East, EU-West, AP-Southeast).
  • Node affinity: Kubernetes node affinity rules ensure pods run only on nodes in the target geography.
  • Storage class selection: Kubernetes storage classes can be configured to use region-specific storage backends.
  • Cross-region restrictions: Network policies can prevent data from leaving the designated region.

Regulatory Alignment

RegulationResidency RequirementEngine Support
GDPRData processing within EEA or approved jurisdictionsRegion-specific deployment
Data localization lawsData must remain within national bordersSingle-region deployment with node affinity
Banking regulationsTransaction records in regulated jurisdictionStorage class and node affinity configuration
Cross-border transferRestrictions on data leaving the jurisdictionNetwork policy enforcement

Sankofa Labs works with customers to configure deployments that meet their specific data residency obligations.