Development

FIH Blackboard: Universal Interface for Multi-Agent Coordination

Author
Affiliation

SSCCS Foundation

Published

June 26, 2026

Abstract

neXus implements the FIH (Fact / Intent / Hint) Blackboard paradigm. The core uses a DualStorage composition of hot (core storage engine, layered coordinate model) and cold (DuckDB or CompositeColdStorage, Parquet-backed) backends, with optional Cypher queries and platform bindings for Cloudflare Workers, native servers, and (as a placeholder) blockchain targets. The Blackboard is a shared multi-modal storage space (three-tier record types across a temporal dimension) that every module reads from and writes to through a single interface. The hot storage is a native Rust coordinate-based index with a temporal ordering layer; PetgraphStorage is an optional 2D projection for legacy graph queries. Storage backends implement fine-grained capability traits instead of a monolithic Storage interface.

Code
Other Formats

Executive Summary

neXus is a modular research infrastructure built on the FIH Blackboard paradigm. Every module reads from and writes to a shared layered coordinate model through exactly three primitives: Fact (validated result), Intent (exploration direction), and Hint (governance rule). There is no fixed pipeline and no direct module-to-module communication. The proven lifecycle — submit, claim, heartbeat, conclude — governs all interaction.

neXus models a self-describing paradigm: the Blackboard is the Scheme, Intents and Hints are the Field, and the act of reading, computing, and writing back is the Observation. The core compiles to WASM (edge) and native (server) from a single codebase. Contract.nex is the single governance surface.

Unified Architecture

The architecture has three layers. The Core Blackboard is the logical layer: a shared multi-modal storage space of Fact, Intent, and Hint primitives that every module reads from and writes to through a single interface. The storage architecture follows the FIH paradigm: the Blackboard holds a DualStorage composition of hot (core storage runtime with coordinate-based indexing and temporal ordering) and cold (DuckDB or CompositeColdStorage, Parquet-backed) backends. PetgraphStorage is an optional 2D projection for legacy graph queries. cyrs translates Cypher queries against the 2D projection. Capability traits replace the monolithic Storage trait, letting each backend implement only the interfaces it provides. Platform bindings expose the core to different deployment targets.

A third storage layer sits alongside the identity-based stores: a plug-in semantic similarity index. Where an identity store provides key-value access by record identifier, a semantic store provides retrieval by meaning. It follows the exact same USB-hub pattern as every other storage backend: the core defines only a thin trait, and external crates provide implementations. The core itself never references any specific methodology such as “vector”; it provides only a lookup handle that each external implementation uses to retrieve exactly the data it needs (feature vectors, raw text, origin strings, etc.). This lets radically different retrieval strategies — HNSW vector search, BM25 string matching, ngram fuzzy search, LLM reranking — all plug into the same index slot without any core changes.

Figure 1: Stigmergy communication: detectors observe and record Facts, agents create Intents, the Scheduler orchestrates the OODA cycle

Multi-Dimensional Blackboard Composition

The Blackboard is a layered coordinate model where the three primary storage domains and a temporal dimension together form a multi-dimensional record space. Relations between primitives are not edges but coordinate differences (distances between points in the coordinate space).

A Blackboard can contain other Blackboards. A Fact at Dimension N becomes a Scheme at Dimension N+1. An Observation at Dimension N+1 becomes a Hint at Dimension N. This mirrors the bootstrap recursion.

Dimension Role Content
Infrastructure DualStorage (HotStorage + ColdStorage) + SemanticStore, capability traits Pluggable storage backends; each dimension can select its own backend composition, plus a semantic similarity index
Domain Fact / Intent / Hint nodes Research documents, experiment results, governance rules
Meta Blackboard composition rules Rules for how Facts at Domain become Schemes at Meta
Research Gaps, hypotheses, validations Domain-specific knowledge exploration

Storage backends are swappable per dimension via capability traits. A dimension requiring filtered reads uses a ColdStorage backend; a dimension requiring low-latency access uses a HotStorage backend. Each dimension can compose hot and cold backends independently. The Queue layer enables stigmergic visibility across dimensions. A write at any dimension is visible to all dimensions. The type conversion (Fact to Scheme, Observation to Hint) happens at the Queue consumer, not at the write stage.

Design Decisions

Why WebAssembly

The decision to compile the core to WASM is not about web browsers. It is about a set of constraints that WASM uniquely satisfies and that together form a non-negotiable foundation.

Figure 2: Why WASM — layered constraints that no other runtime satisfies

Sandbox isolation. Every Agent, every projector, every verification module runs in its own WASM instance. Memory is isolated at the hardware level. A compromised projector cannot read another projector’s Fact cache. This is not configurable discipline — it is structural impossibility.

Deterministic execution. WASM’s specification guarantees that the same bytecode with the same input produces the same output, across all hosts and all platforms. A vIP asset’s value is that its verification result is reproducible by any party. WASM determinism is the mechanism that makes reproducibility a mathematical guarantee rather than a procedural hope.

Cold start under 1 millisecond. The OODA loop spawns and despawns agents on every tick. An agent that takes 100ms to start cannot participate in sub-second iteration. WASM instances start in microseconds and consume megabytes, not gigabytes. Docker is not an alternative here — it is a different category of tool for a different problem.

Portable binary. The same compiled WASM module runs on Cloudflare Workers (edge), AWS Lambda (Graviton), a researcher’s laptop (ARM), and a future on-chain contract environment (WASM blockchain host). The core never needs recompilation for a new deployment target. Platform bindings are thin adapters, not architectural commitments.

Double sandbox. WASM isolates execution at the machine level. Field isolates operation at the paradigm level. An agent that escapes its WASM sandbox still cannot execute a constraint combination that its Field does not permit. An agent whose Field permits a combination still cannot read host memory. The two sandboxes are orthogonal and cumulative. Forging a verification proof requires breaking both.

Together, these properties are not conveniences. They are the minimal set of constraints that make a cross-reality verification economy possible. No other runtime satisfies all of them.

SemanticStore: The Flashlight Pattern

Alongside identity-based stores (record identifiers to values, temporal ordering), a third storage pattern exists: a plug-in semantic similarity index. Where an identity store maps record identifiers to values, a semantic store maps semantic features to record identifiers for similarity retrieval.

The core design follows the same USB-hub pattern as every other storage backend. The core defines only a thin trait; external crates provide implementations. The trait itself uses a flashlight pattern: instead of receiving fixed-format data (such as numeric vectors), each method receives a lookup handle. The implementation uses this handle to request exactly the data it needs.

The lookup handle exposes accessors for content bytes, decoded text, numeric feature vectors, origin strings, and creator strings. Each implementation calls only the accessors it actually requires. A vector-based store requests feature vectors; a text-similarity store requests raw text; an ngram store requests origin strings. The core never determines which methodology is in use.

The coordinate index exposes a semantic slot alongside its existing indexes (origin, time, status). Each semantic backend plugs into this slot without any changes to the core index logic.

Execution Unit Model: nex Is Not a Library

nex is not a general-purpose library. Each storage instance is an execution unit: a single-threaded, self-contained runtime that owns its memory state and I/O channel exclusively. There is no shared mutable state between instances. No internal thread pool. No locking primitives in the hot path.

This design arises not from any specific platform constraint but from the nature of distributed blackboard coordination. Instances communicate exclusively through the FIH protocol — writing facts, intents, and hints to a shared external storage layer (object store, filesystem, network). They never share internal indices or entity stores with another instance. Coordination is stigmergic, not direct.

Scaling happens through physical instance replication, not internal sharding:

coordinator (process manager)
  ├── storage instance A (single thread, independent I/O channel)
  ├── storage instance B (single thread, independent I/O channel)
  ├── storage instance C (single thread, independent I/O channel)
  └── ...
       │
       └── Shared blackboard (external storage / FIH protocol)

Each instance is an atomic unit. Adding an instance adds capacity linearly. Failure of one instance does not affect others. Every instance can run in a sandboxed environment, a native process, or a lightweight VM — the binary and the execution contract are identical.

Interior Mutability Without Locking

Internal mutability uses runtime borrow-checked cells, not OS locking primitives. This is not a platform concession. It is the simplest correct implementation for a single-owner execution unit. These cells are zero-cost at runtime (no atomic operations, no OS futex). If an external caller needs thread-safe access, it wraps the instance in a standard synchronization primitive externally — that is an external composition, not an internal requirement.

Internal locking would be a design error. It would introduce contention, deadlock risk, and complexity — all for the false promise of internal parallelism. These instances do not parallelize internally; they replicate externally.

Async-Only Storage Interface

Storage instances do not expose synchronous blocking interfaces. All public storage methods are asynchronous. This is not negotiable:

  • Storage is inherently I/O-bound. Blocking on I/O in a single-threaded execution unit stalls all pending operations.
  • A synchronous interface on a single-threaded storage engine would be a lie: the implementation uses runtime borrow-checked cells (not thread-safe) and relies on cooperative multitasking (not preemptive scheduling).
  • Sync callers use an async runner (block_on) externally. They do not require sync trait implementations on the storage engine itself.

No specific runtime is the reason for async-first. Every edge deployment where async-first happens to be essential is coincidental. The reason is that storage is asynchronous, and this is a single-threaded storage execution unit.

No Global Mutable State

A storage instance carries no global mutable state except fixed constants. Every resource (I/O handle, index, buffer) is owned by the instance. This guarantees that:

  • Two instances never accidentally share state.
  • Spawning a new instance is purely a construction operation with no global side effects.
  • Each instance can be sandboxed independently (linear memory, container, process).

Storage IO Design

Storage is inherently asynchronous. At the hardware level, every I/O operation involves pipelining, interrupts, or completion queues — whether it is a DRAM read, a DMA transfer, an NVMe submission queue entry, or a network packet. Synchronous I/O is a programmer convenience abstraction layered on top of fundamentally async hardware.

Because of this, nex makes async the design center rather than an adapter bolted on afterward. The IO trait is async at the trait level. This is not an adapter layer — async is primitive, sync is extension.

This design aligns naturally with every target platform. Object stores support await directly with no blocking needed. On native runtimes, the pattern is spawn plus await on async filesystem and network operations. Each platform uses the same async trait; only the executor changes.

All writes go through a pending buffer and are committed in a single batch call. Rather than each operation traversing the I/O boundary independently, individual calls are amortized across the flush cycle. The caller controls durability by choosing when to flush.

Optional Sync Wrapper (Native Only)

For consumers that require a synchronous interface on native platforms, a blocking wrapper exists that uses an async runner internally. It is not the recommended interface for new code.

Why Layered Coordinate Model, Not Docker Containers

The initial neXus architecture used Docker containers (Memgraph + proxy + LightRAG) orchestrated by shell scripts. The transition to a native codebase eliminated runtime dependencies and revealed that the underlying model is not a 2D graph but a multi-dimensional coordinate model:

Reason Impact
Single codebase cargo build produces WASM (CF Worker) or native binary (server) from same source
Zero runtime deps No Docker, no Python, no bolt proxy, wrangler deploy only
Multi-dimensional model FIH is not a 2D edge graph. It operates along multiple independent record domains where facts, intents, and hints each occupy their own coordinate set, with temporal ordering as an additional axis
Relations as coordinate differences Relations between primitives are not graph edges but distances between points in the coordinate space
2D projection optional Traditional knowledge graphs are 2D node-edge models. FIH can project into 2D for interop, but the native representation is multi-dimensional

This is why the WASM build of petgraph failed: a 2D graph library was being forced to model a multi-dimensional structure. The core storage runtime operates natively in the correct coordinate model directly.

Why Cypher

Cypher is the only graph query language with sufficient LLM training data for reliable code generation. Multiple languages supported. Our cypher/ crate translates Cypher to petgraph traversals using cyrs for parsing → Plan IR.

Cypher is optional syntactic sugar over the native coordinate API (FilterCapable + temporal ordering layer + from_facts). The layered coordinate model query interface is the primary access method; Cypher translation is a secondary concern. Consumers that only need native FIH queries do not depend on the cypher crate.

Graph Storage Approaches (Reference Implementations)

The FIH Blackboard can be backed by multiple storage approaches. The primary implementation is the native layered coordinate model; other approaches exist for legacy compatibility.

Approach Storage Adoption
Core storage runtime Coordinate-based index with temporal ordering (layered coordinate model, native) Primary (Phase 3, #86)
SemanticStore Plug-in semantic similarity index via FihLoad flashlight pattern Adopted (trait in core, implementations external)
PetgraphStorage Petgraph (2D graph projection, optional) Legacy compatibility, optional
DuckDB / Parquet Cold storage for analytical queries Adopted (cold backend)
Memgraph In-memory LPG + RocksDB WAL Patterns extracted (supplementary)

The core storage runtime implements the layered coordinate model directly with coordinate-based indexing for O(1) lookups and a temporal ordering layer for time-range queries. It replaces PetgraphStorage as the default hot storage. PetgraphStorage remains as an optional 2D projection for legacy graph queries (community detection, PageRank, shortest path) that operate on a 2D slice of the coordinate space. DualStorage composes the core storage runtime (hot) with DuckDB (cold) as the default configuration; PetgraphStorage can be composed as an additional 2D view.

The supplementary analysis provided insights into specific capabilities. The table below documents the mapping between Memgraph and our adopted approach for granular traceability.

Memgraph Pattern Mapping

Capability Memgraph Approach Our Adaptation
Graph storage In-memory LPG + RocksDB WAL Core storage runtime (layered coordinate model) + duckdb/Parquet
Vector search USearch (C++), Single Store Vector Index ndarray cosine (optional, on 2D projection)
Community detection Louvain + Leiden (C++ MAGE) community-detection crate (on 2D projection)
PageRank Custom C++ petgraph built-in (optional, on 2D projection)
Module isolation C API (mg_procedure.h) Rust trait in modules/ crate
Atomic GraphRAG Single query = search + expand + rank + prompt FilterCapable + temporal ordering layer native query

Architecture Stack

The Blackboard is assembled from native Rust modules and optional third-party crates. The core storage runtime implements the layered coordinate model using only standard collection types and a custom temporal ordering layer. PetgraphStorage is an optional 2D projection for legacy graph queries.

External Dependencies: Candidates

Where possible, we depend on stable Rust crates rather than implementing from scratch. Rust crate dependencies are permanent.

Graph Storage & Query

Concern Candidate Justification Status
Native layered coordinate model Core storage runtime (coordinate-based index + temporal ordering layer) Primary hot storage. No external crate needed for the core data model. Adopted (primary)
2D graph projection petgraph (0.6) Optional. Standard Rust graph lib, StableGraph, NodeIndex, built-in PageRank and Dijkstra. For legacy graph queries on a 2D slice of the coordinate space. Optional dependency
Cold storage / analytical duckdb (1.105) Parquet-backed, bundled, vector/JSON/CTE/window. Native only. Adopted
Cypher parsing cyrs Parses Cypher to typed Plan IR in one step. A unified query IR decouples input languages from execution. Optional (for Cypher path)
Vector similarity ndarray (0.15) + cosine Our data volumes do not require HNSW. When they do, USearch has Rust bindings. Memgraph uses USearch internally. Adopted

Graph Algorithms

Algorithm Candidate Status Provided By
Louvain community community-detection crate Adopted petgraph compatible
Leiden community community-detection crate Adopted Same crate, variant feature flag
PageRank petgraph::algo::page_rank Adopted Built into petgraph
Dijkstra / shortest path petgraph::algo::dijkstra Adopted Built into petgraph
Betweenness centrality petgraph + custom Candidate Minimal implementation on top of petgraph
Cosine similarity ndarray Adopted Used by vector index; no separate crate needed

Platform Bindings

Target Candidate Deploy Command
Cloudflare Worker worker crate (worker-rs) wrangler deploy
AWS Lambda lambda_runtime crate cargo lambda deploy
Native server axum + clap cargo run
On-chain (blockchain) (placeholder) Future decision
Zed ACP bridge nex-zed-agent (standalone binary) cargo build; register in Zed settings

Deployment Topology: Two Operational Modes

neXus supports N deployment modes sharing the same F-I-H interface. Two representative cases:

Mode Runtime Storage Subscribe Use Case
Real-time Blackboard Persistent daemon In-memory (+ optional WAL) Yes (live peer notification) Agent coordination, interactive sessions (Zed, ev)
Storage / Fact Store Serverless worker R2 / DuckDB / object store No (poll or query) Batch ingestion, archival, cross-session retrieval

A real-time Blackboard can use a storage variant as its durable backend, flushing accumulated blocks to R2 on graceful shutdown. Both expose the same BlockStore trait — only delivery semantics differ.

Design Principle: Implement Strategically, Depend Widely

Where differentiation matters (Cypher-to-petgraph translation, gap detection heuristics, platform bindings), we write code. Where stability matters (graph algorithms, persistence, parsing), we depend on verified Rust crates. The Blackboard is the thinnest possible layer that turns a collection of independent modules into a coherent multi-dimensional analysis platform.

Development Model: Three Hard Layers

The codebase stratifies into three layers with different change tolerances. This is not an architectural abstraction but a development workflow enforced by how each layer is validated.

Layer Change Tolerance Validation Examples
Consumption Scenarios Immutable nexus-sim scenario definitions capture the domain invariants; cannot be compromised without invalidating the ecosystem’s purpose Proof by Structure 7 attack scenarios (see Proof by Structure); exchange contract requirements; ev verification flows
Orchestration Layer Flexible nexus-sim runs scenario definitions against the foundation; failures force orchestration changes exchange_fact() overlay (defined in Proof by Structure); API gateway handlers; C2PA verification wrapper; session lifecycle
Foundation (IO Layer) Stable Existing comprehensive test suite; core storage runtime/DuckDB contract; never modified without a nexus-sim scenario proving the capability gap AsyncStorageRead, AsyncFactCapable, DualStorage; nex core storage runtime / R2 bindings

Consumption scenarios are the hardest layer. They express what the system must guarantee — “no entity can read a Fact without contributing one” is not negotiable because it derives from the ecosystem’s game-theoretic requirements. The foundation is also hard: the core storage runtime’s indexing and ordering contracts, DuckDB’s ACID guarantees, and the FIH capability trait contracts are fixed by their respective libraries.

The orchestration layer between them is the only soft layer. It evolves continuously as nexus-sim validates scenarios against the foundation. When a scenario fails, the orchestration layer is updated first; only when a scenario demands a capability the foundation cannot provide (e.g., atomic compare-and-swap that the storage backend lacks) is the foundation modified. This is the scenario-driven reverse development pattern: the consumption scenario drives the orchestration, which in turn pressures the foundation only when necessary.

Figure 3: Three hard layers: scenarios are immutable, foundation is stable, orchestration between them is flexible and scenario-driven.

A concrete example: exchange_fact() (defined in Proof by Structure) is an orchestration-layer function. It uses StorageRead and FactCapable from the foundation. If nexus-sim proves that a caller can bypass exchange_fact() by calling StorageRead::get() directly through a public API, the fix is in the orchestration layer — the public API endpoint must only expose exchange_fact(), not the raw storage trait. The foundation remains untouched. Only if nexus-sim proves that StorageRead itself cannot enforce the required ordering (e.g., because it lacks CAS semantics that the exchange contract needs) would the foundation be modified.

This three-layer validation pattern mirrors the IoBuffer+StoreSession architecture at a different scale: there, the foundation provides sync storage traits, the IoBufferSession overlay orchestrates hydration and flush, and the Worker request lifecycle (the consumption scenario) drives the entire cycle.

Core Storage Engine

The core is a Cargo workspace with clean separation of interfaces and implementations. The data model follows a layered coordinate model (three-tier record types across a temporal dimension), not a 2D edge graph. The monolithic Storage trait has been replaced by fine-grained capability traits. Each backend implements only the capabilities it provides. The core storage runtime (coordinate-based index with temporal ordering) is the primary hot storage; the 2D graph projection is optional.

Capability traits come in pairs: sync variants for multi-thread-safe backends (in-memory graph projection, composite), and async variants for the execution-unit storage engine. Each backend implements only the variant that matches its execution model. Aggregate aliases compose the traits needed for common backend roles.

CypherCapable (QueryCapable) is no longer part of ColdStorage, because the query interface is independent of storage. DuckDbStorage implements QueryCapable directly; CompositeColdStorage does not.

DualStorage composes a hot and cold backend:

  • Writes delegate to both hot and cold (dual-write)
  • Reads go to hot (sub-ms latency for edge computing)
  • Filtered reads delegate to cold (hot has no SQL/filter capability)
  • Commit channel: CompositeColdStorage holds a separate commit_kv/commit_blob pair used exclusively by flush_since(). The commit channel replaces dirty tracking — flush_since writes its output (blob archives, cursor state) through the commit channel, never polluting the general read/write path. The flush cursor stored in commit_kv serves as the deterministic flush boundary, making dirty tracking unnecessary. Consumer reads the cursor via read_cursor() to know which data has been flushed.
Figure 4: CQRS commit channel: cursor replaces dirty for flush boundary detection

DualStorage is generic over <H: HotStorage, C: ColdStorage> rather than Box<dyn HotStorage> / Box<dyn ColdStorage>. This preserves AFIT (async fn in trait) compatibility for async trait migration without boxing overhead at the call site.

Crate structure:

  • model/ — Data model crate: FIH lifecycle interface, capability traits (sync and async variants), DualStorage composition, blob/meta/object store traits, clock abstraction, detection trait hierarchy.
  • interface/query/ — Backend-agnostic tabular query specification (filter, ordering, aggregation).
  • interface/cypher/ — Optional Cypher parser and executor. Translates Cypher to native queries for the 2D graph projection.
  • nex/ — Core storage engine: async-only execution unit, coordinate-based index with temporal ordering, optional sync wrapper for native platforms, optional 2D graph projection, durable cold storage, OODA scheduler, all detectors.
  • storage/duckdb/ — Parquet-backed analytical cold storage.
  • storage/sim/ — In-memory I/O backends (test doubles + filesystem) and scenario-driven verification runner.
  • storage/ve-composite/ — HTTP server exposing blob/meta/object store endpoints for session-backed composite storage.
  • gateway/api/ — HTTP REST server exposing the FIH lifecycle.
  • gateway/nex-cf/ — Edge-deployed gateway with object-store-backed storage and semantic search.
  • gateway/nex-cf/mock/ — Local simulation of the edge gateway for offline development.

Cypher Translation

The cypher/ crate translates Cypher to petgraph operations on the optional 2D projection. Our gap-detector needs only three patterns on the 2D view:

Cypher Pattern petgraph Translation
MATCH (c:Concept) WHERE... RETURN c node_indices().filter(label).filter(condition)
OPTIONAL MATCH (c)-[r]-() WITH c, count(r) WHERE rc = 0 neighbors() + count + filter
MATCH (a)-[r1]->(b) MATCH (a)-[r2]->(b) WHERE type != type edges() cartesian product + filter

We do not implement full Cypher. We implement the subset our modules actually need. The primary query interface is the native layered coordinate model API (FilterCapable + temporal ordering layer + from_facts).

Graph Algorithms (Optional 2D Projection)

Graph algorithms operate on the optional 2D projection (PetgraphStorage). They follow the MAGE module pattern: each algorithm is a standalone function that takes a graph reference and returns results. The primary layered coordinate model does not require these algorithms.

Algorithm Implementation Notes
Louvain community-detection crate On 2D projection only
Leiden community-detection crate Same crate, variant feature flag
PageRank petgraph::algo::page_rank Built into petgraph
Dijkstra petgraph::algo::dijkstra Built into petgraph
Cosine similarity ndarray Used by vector index

Analysis Modules

Modules follow Blackboard semantics: read Facts and Hints, emit Facts (detectors) or Intents (agents). Every module shares the same layered coordinate model interface. Internal implementation (native, LLM-driven, heuristic) is invisible to other modules. Detectors observe patterns and record them as immutable Facts; agents read detector Facts and decide which to act on by creating Intents. This separation — observe as Fact, act as Intent — is enforced architecturally through the DetectionCapable trait hierarchy.

Module Input Output Status
Gap Detector Fact graph Fact (gap pattern) Rust (origin + cross-origin topic levels)
Contradiction Detector Fact graph Fact (contradiction) Rust (same-topic/different-position)
State Change Detector Fact graph Fact (state transition) Rust (State ReasonCheckpoint)
New Document Analyzer Fact graph Fact (+factor/-factor/gap) Rust (baseline-aware)
OODA Loop (in-process) Fact + Intent + Hint Scheduler tick Rust (in-process, detection traits)
Hypothesis Generator Gap Fact + Fact evidence Intent (hypothesis) Future
Concept Validator Hypothesis Intent + experiment Fact (validated/rejected) Future
Entity extraction Raw document Fact (entity) Future
Flow-GRPO Planner Successful Intent histories Updated Planner weights Future

No module calls another module. All communication passes through the Blackboard. The queue layer serializes writes; coordination is emergent from agents observing and responding to Blackboard state. Detectors implement fine-grained capability traits (GapDetection, ContradictionDetection, StateChangeDetection) mirroring the storage trait architecture — each detector provides only the capabilities it supports, and the Scheduler composes them via Vec<Box<dyn DetectionCapable>>.

Artifact Storage & Ingestion

R2 (bucket: ssccs-nexus-af) is the single source of truth. The af-sync worker performs incremental sync between R2 and LightRAG via Queue-based processing with drift detection. Already deployed; no change required.

Platform Bindings

Binding Crate Compilation Target Status
Cloudflare Worker gateway/ (planned: rs-worker) wasm32-unknown-unknown Future (SessionExecute ready)
Native Server gateway/api/ host Active (axum server)
On-chain (blockchain) Future wasm32-wasi Placeholder

TypeScript Orchestration

Worker Role Protocol
af-sync R2 artifact sync → RAG engines HTTP, writes Facts to RAG Blackboard

The gap-detector and other analysis modules have been migrated to Rust in the nexus crate, where they run as in-process DetectionCapable implementations composed by the Scheduler. The TypeScript layer retains only the sync worker for R2 artifact ingestion.

Module Communication: Stigmergy Through the Blackboard

Modules do not call each other. Each module reads from the Core Blackboard and writes back to it. Coordination is indirect, inspired by stigmergy patterns: agents leave traces in a shared environment, other agents perceive those traces and adapt their behavior.

The Blackboard stores three primitives. Facts are what the system has learned — including detector observations about the knowledge state. Intents are what agents want to explore. Hints are governance rules and human guidance. Detectors observe the layered coordinate model and record patterns as Facts; agents read detector Facts and decide which to act on by creating Intents. The Scheduler drives the OODA cycle, calling each detector every tick.

Figure 5: Stigmergy communication: detectors observe and record Facts, agents create Intents, the Scheduler orchestrates the OODA cycle

Key principles. Every method converges on the same three primitives regardless of internal complexity. There is no fixed pipeline. The Blackboard’s current state determines which module acts next.

Contract Governance & Future DeSci

Contract.nex defines research rules: evidence thresholds, novelty minimums, report structure, and reward schedules. Every module evaluates contract.nex before writing to the Blackboard. The governance surface is a single file, not distributed across modules.

Future: Token-incentivized research. Contract.nex can execute on a blockchain. Research contributions become automatically verifiable and rewardable:

Contract.nex → Smart contract (blockchain)
    ├── Gap discovery → Token reward
    ├── Hypothesis validation → Staking + reward
    ├── Experiment replication → Replication reward
    └── Concept drift detection → Drift token

Every (origin, intent, result) tuple on the Blackboard has a content-addressable hash. This hash serves as a verifiable proof of contribution, recordable on-chain without storing the full payload.

Architecture Inspirations

Current Layer (Multi-Dimensional Storage Core)

The storage layer is now a multi-dimensional coordinate model, not a graph core. The core storage runtime (coordinate-based index with temporal ordering) is the primary hot storage. PetgraphStorage is an optional 2D projection for legacy graph queries.

Source Pattern What We Adopted
Memgraph (C++, production) Atomic GraphRAG (single query = search + expand + rank + prompt) Iterator chain translation in cypher/translate.rs
Memgraph MAGE module isolation (C API, standalone algorithms) Rust trait isolation in modules/ crate
Memgraph Single Store Vector Index (embedding as node property) SemanticStore + FihLoad (methodology-agnostic; vector is one implementation)
Memgraph WAL + in-memory dual storage duckdb/Parquet + core storage runtime memory store
cyrs / cypher-rs ecosystem Cypher → typed Plan IR cyrs as parser dependency (optional, Cypher path)
Core storage runtime Coordinate-based index + temporal ordering Primary hot storage for the layered coordinate model
SemanticStore (nex core) Flashlight pattern (FihLoad) for methodology-agnostic semantic search Plug-in semantic similarity index alongside EntityStore and OrderedIndex
PetgraphStorage (optional) StableGraph, NodeIndex, built-in PageRank/Dijkstra Optional 2D projection for legacy graph algorithms
Capability-based traits Fine-grained Rust traits replacing monolithic interfaces StorageRead, FactCapable, FilterCapable, EvictCapable, etc. — also DetectionCapable, GapDetection, ContradictionDetection, StateChangeDetection for the detection layer
DualStorage pattern Hot + cold composition for edge-cloud routing DualStorage { hot: HotStorage, cold: ColdStorage }
CQRS-inspired commit channel Command/query separation for flush output CompositeColdStorage with commit_kv/commit_blob (dirty tracking OFF), breaking self-referential dirty in flush_since()
Figure 6: Mapping the commit channel pattern across Git, CQRS, and FIH
Figure 7: USB hub pattern: the core is a thin hub, unlimited semantic backends plug in through the same SemanticStore trait

A key architectural insight is that a unified query IR decouples input languages from execution. Future agent modules (Planner, Verifier, Hypothesis Generator) will query the layered coordinate model through this same IR. They do not need to know whether the backend is the core storage runtime in memory, DuckDB on disk, or a remote CF Worker. The query interface is the plug, the core is the socket. Adding a new query language (GQL, SPARQL, or a future agent DSL) requires only a new parser adapter, not a core change.

Stigmergy Layer (Implemented)

The Blackboard architecture is validated by proven stigmergic search: a minimal OODA loop that reads from and writes to a shared blackboard of Fact, Intent, and Hint primitives.

The stigmergy layer is implemented in the nexus crate. The Scheduler drives the OODA loop, calling detectors every tick. Detectors (Gap, Contradiction, State Change, New Document) observe the layered coordinate model and record patterns as Facts. Agents read these detector Facts and create Intents. The ReasonCheckpoint pattern — simple count-based state change detection — is built into StateChangeDetector. All detectors implement the DetectionCapable trait hierarchy, which mirrors the storage capability trait architecture.

Beyond the current implementation, future research layers will attach to the same Blackboard through the same interface:

Paradigm Source Core Insight Blackboard Role
Stigmergic Search Stigmergic Search (validated) Indirect coordination through shared traces Core coordination mechanism; Queue + Blackboard separation
In-the-Flow Agentic Optimization AgentFlow arXiv:2510.05592 Trainable Planner, Flow-GRPO learning Reads successful Intent histories, updates Planner weights
Hypothesis-Driven Discovery HypoChainer arXiv:2507.17209 LLMs + KG + humans build hypothesis chains Writes Hypothesis Intents, reads the layered coordinate model for evidence
Contract-Governed Generation Story2Proposal arXiv:2601.20833 Shared contract enforces structural obligations contract.nex evaluated before every Blackboard write

Agent Oracle: The Purest Observer

The Agent Oracle is a module that reads from the Blackboard and writes back predictions and simulation results. It has no special privileges. It uses the same Fact / Intent / Hint interface as every other module.

Oracle input: accumulated (origin, intent, result) histories on the Blackboard. Oracle output: new Facts (predicted outcomes) and new Intents (simulation branches to explore). The Oracle does not need direct communication with any other module. It reads the traces left by real-world experiments, market data ingestors, or hardware simulations, and writes its projections back to the same space.

This applies to any domain. Business simulation: Fact = market data, Intent = strategy proposal, Oracle output = projected outcome. Hardware emulation: Fact = RTL trace, Intent = optimization, Oracle output = utilization prediction. The Blackboard does not distinguish between the two. The primitives are the same.

Strategic Value

  • Blackboard is the single interface. Every module reads and writes the same three types in a layered coordinate model. Internal complexity is irrelevant.
  • Stigmergy over orchestration. Modules coordinate indirectly through the Blackboard. No module calls another module. No pipeline dependency chain.
  • LLM-optional. A stigmergic search system proved a full suite of problems without any LLM. LLMs are accelerators, not requirements.
  • Iteration over planning. Interface design improves through repeated use, not upfront specification. Virtual responses enable continuous iteration.
  • DeSci-ready. Contract.nex enables token-incentivized research. Content-addressable hashes provide verifiable proofs of contribution.
  • Portable core. Same codebase compiles for WASM (edge) and native (server). Zero platform lock-in.

Current Status (2026-06)

Phase 3 (native layered coordinate model storage) is complete: a comprehensive test suite across the core storage runtime, incremental persistence, ColdStorage trait, and temporal query validation. The core storage runtime (nex crate, coordinate-based index with temporal ordering) replaces PetgraphStorage as the primary hot storage.

Component Status
Semantic search trait (flashlight pattern) Implemented (core storage module)
Semantic index slot in coordinate index Implemented (plug-in slot alongside origin, time, status indexes)
In-memory BM25 search Implemented with tests
Vectorize cloud backend Implemented: pluggable embedder trait, offline local embedder for development
External semantic backends (HNSW, ngram, LLM reranker) Future (external crates)
af-sync worker (R2 to engines) Deployed
RAG engines (LightRAG) Reference implementation; sync pipeline active
CI/CD Deployed; WASM + native dual-path workflows
Core data model crate Blackboard trait, FIH lifecycle, capability traits (sync + async variants), DualStorage, blob/meta/object store traits
Backend-agnostic query interface Tabular query specification, filter, ordering, aggregation
Cypher parser and executor Optional: translates Cypher to native queries for the 2D graph projection
Core storage engine Async-only execution unit; coordinate-based index with temporal ordering; no sync blocking interfaces
DuckDB cold storage Parquet-backed analytical backend with CTE, window functions, JSON, vector search
In-memory IO backends Test doubles and filesystem IO for development and verification
Cloudflare Worker gateway R2-backed storage with Durable Object, BM25 + Vectorize semantic search
HTTP API gateway Axum REST server exposing the FIH lifecycle
Serialization proxy Serde validation layer over storage traits
Local simulation server In-memory mock of the full Cloudflare pipeline for offline development
Async-only storage execution unit All public methods async; no synchronous trait implementations; runtime borrow-checked cells; single-thread; no global mutable state
In-memory 2D graph projection (optional) Legacy graph queries (community detection, PageRank, shortest path)
Durable cold storage Blob + metadata + object store for long-term persistence
FIH lifecycle submit, claim, heartbeat, release, conclude
Sync storage capability traits Implemented by multi-thread-safe backends (PetgraphStorage, HybridBlackboard); NOT implemented by the async-only storage engine
Async storage capability traits Implemented by the async-only storage engine
Hot + cold composition Dual-write to hot and cold backends; reads from hot; filtered reads delegate to cold
OODA scheduler Tick-based polling, heartbeat TTL, eviction trigger, stale intent cleanup
Gap detection Origin-based + cross-origin topic gap analysis
Contradiction detection Same-topic/different-position analysis
State change detection Pattern-based checkpoint detection, snapshot-safe
New document analysis Plus-factor/minus-factor/gap analysis
Detection capability traits Unified trait hierarchy for all detectors
Cypher translation cyrs pipeline, dual-path executor, cold query routing
Document ingestion pipeline Read markdown files from object store, chunk, submit as facts, auto-index semantically
Deferred-write IO Batch multiple writes into a single storage call
Scenario tests 60+ tests across storage engine, IO backends, and gateway
Core storage tests Comprehensive suite (Phase 3 complete)
Incremental persistence Delta aggregation with cursor tracking
Cold storage interface Unified trait for cold backends
Temporal query validation Replay and consistency checks across time
Agentic loop Full FIH lifecycle (detector to fact to intent to conclusion)
CQRS commit channel Cursor-based flush boundary, separate commit path
Session-backed IO Generic serialized queue for async I/O emulation
Process coordinator Planned: physical instance replication, sandboxed lifecycle
On-chain DeSci Placeholder

Edge-to-Cloud Portability & AWS Integration

Principle: Bindings Are Deployment Targets, Not Architectural Commitments

The platform bindings layer (bindings/) isolates deployment-specific code. Adding a new deployment target requires only a new binding crate. The core (nexus-core/) never changes. Available targets: Cloudflare Workers (wasm32-unknown-unknown), AWS Lambda + Graviton (aarch64-unknown-linux-gnu), Native server (axum, host architecture), On-chain (wasm32-wasi, placeholder).

Environment Binding Hardware Latency Profile Best For
Cloudflare Workers bindings/cf/ Edge (300+ locations) < 50ms cold, sub-ms warm Real-time queries, API gateway, gap detection on document arrival
AWS Lambda (Graviton) bindings/aws/ ARM64 (Graviton3/4) < 100ms cold, ms warm Batch analysis, large-scale community detection, model training
AWS EC2 / HPC bindings/aws/ x86_64 / GPU Sub-ms (warm resident) Hardware simulation, compiler optimization loops, pre-silicon emulation
Local / VPS bindings/server/ Any 0ms Development, offline research, private data

Why This Matters for SSCCS

SSCCS’s research agenda spans multiple compute domains that no single platform can satisfy:

Research Activity Compute Profile Optimal Platform
Document ingestion & entity extraction I/O-bound, bursty CF Workers (edge proximity to R2)
Knowledge graph query & gap detection CPU-bound, graph traversal CF Workers or Lambda
Community detection (Louvain/Leiden) CPU-bound, iterative Lambda (Graviton, longer timeout)
Hypothesis generation (LLM) Memory-bound, GPU-accelerated EC2 with GPU or Workers AI
Compiler optimization (SSCCS ↔︎ llvm-project) CPU-intensive, iterative EC2 / HPC
Hardware emulation (pre-silicon RTL) CPU + memory intensive, long-running EC2 / HPC
Training (Flow-GRPO for Planner) GPU-bound, hours-long SageMaker / EC2 GPU

AWS Integrations: Concrete Paths

AWS Lambda + Graviton (Immediate)

Rust compiles natively to aarch64-unknown-linux-gnu. The same nexus-gap crate that runs as a CF Worker also runs as a Lambda function on Graviton: no code changes.

Amazon S3 ↔︎ R2 Bridge (Data Gravity)

SSCCS documents live in R2. AWS compute can access them via S3-compatible API:

Or migrate hot data to S3 for lower latency from AWS compute, keeping cold data in R2.

AWS Nitro Enclaves for Contract Governance

contract.nex verification can run inside an AWS Nitro Enclave for cryptographic attestation: the verification result comes with a signed proof that the contract was executed faithfully, without relying on blockchain.

SageMaker for Flow-GRPO Training

The Learning Loop (Layer 4) trains the Planner via Flow-GRPO. Training trajectories collected from CF Workers are stored in R2, then training runs on SageMaker with GPU instances:

CF Workers → R2 (trajectory JSONL)
    ↓
SageMaker Training Job (ml.g5.xlarge)
    ↓ policy update
New Planner checkpoint → R2
    ↓
CF Workers pick up new checkpoint

Future: Hardware Simulation Integration

SSCCS’s core research: Segment, Scheme, Field, Observation, Projection: is about rewriting computing’s ontology. This extends naturally to hardware:

Pre-Silicon Emulation Pipeline

R2 (RTL designs, benchmark configs)
    ↓
EC2 HPC (Graviton4, 64 vCPU, 256GB RAM)
    ↓
nexus-core compiled as native binary
    ├── layered coordinate model ingests simulation traces as temporal coordinates
    ├── gap detector finds inefficiencies in pipeline utilization
    ├── community detection groups related hardware modules
    └── contract governs which optimizations are valid
    ↓
Findings → R2 → LLM analysis → Compiler patch proposals → llvm-project PRs

Why Same Core for Hardware Analysis

The gap-detector that today finds orphaned concepts in documentation will tomorrow find underutilized functional units in RTL simulations. Same layered coordinate model structure, different data:

Today (Documents) Tomorrow (Hardware)
Fact = validated research result Fact = RTL simulation trace
Intent = exploration hypothesis Intent = optimization candidate
Coordinate = (three-tier record types, temporal) Coordinate = (three-tier record types, temporal)
Gap = orphaned concept Gap = idle functional unit

Strategic Position (Example Narrative)

Figure 8: Strategic Position: Edge-to-Cloud Portability

For example, this is not “CF or AWS.” It is “CF for what CF does best, AWS for what AWS does best, core is the same either way.” The meeting-ready narrative is: SSCCS is building a portable multi-dimensional analysis platform whose deployment surface spans edge-to-cloud, with zero architectural commitment to any single vendor.

Nexus-SIM: Virtual Emulation Test Suite

Every async I/O boundary (CF Workers KV, R2, Durable Object; ROS2 topic pub/sub; blockchain validators; distributed transaction coordinators; edge AI inference) can be emulated through the same SessionExecute trait. The virtual emulation test suite is not a convenience. It is the primary development surface. A backend that passes the nexus-sim test suite will work with any real binding that implements the same trait.

nexus-sim sits between consumption scenarios and the foundation, proving that the orchestration layer enforces scenario requirements and that the foundation provides the capabilities the orchestration layer needs. This is the three-layer development model (Development Model) in action: nexus-sim is the validation engine that closes the loop from scenario to orchestration to foundation.

Scenario-Driven Reverse Development

nexus-sim inverts the conventional development flow. Instead of implementing and then testing, the consumption scenario drives the entire stack: define the scenario, express as a test case, run against orchestration, update orchestration until pass, and commit the scenario as a permanent regression test. The full process with agent models, depths, CI integration, and implementation phases is in Simulation Suite.

nexus-sim’s test suite is always ahead of the implementation. See issue #69 for current status and roadmap.