Nexus

Boundaryless Autonomous Research Infrastructure. Contract-Governed, Hypothesis-Driven, and Cross-Reality

Author

Affiliation

SSCCS Foundation

Published

May 10, 2026

Abstract

Modern fundamental computing research including SSCCS redefining the von Neumann paradigm through structural observation, cannot remain confined to textual and code artifacts. The infrastructure that supports such research must evolve from a document‑code knowledge graph into a cross‑reality research manifold: a unified representation where whitepaper insights, compiler traces, physics simulations, robotic validations, sensor streams, and hardware‑in‑the‑loop measurements all coexist in a single, queryable, verifiable structure.

Development Roadmap

Link

Executive Summary

Nexus defines this infrastructure as a self‑improving, contract‑governed, agentic system that not only ingests and connects heterogeneous knowledge, but actively generates, validates, and learns from hypotheses that span the digital‑physical boundary. It rests on three foundational paradigms:

Paradigm	Core Insight
In‑the‑Flow Agentic Optimization	A trainable Planner inside a multi‑turn loop, optimized via on‑policy reinforcement learning, learns long‑horizon tool‑use strategies from sparse outcome rewards.
Hypothesis‑Driven Discovery	LLMs, knowledge graphs, and human experts collaboratively build and validate hypothesis chains grounded in graph‑structured evidence.
Contract‑Governed Multi‑Agent Generation	Specialised agents operate under a persistent shared contract that enforces structural obligations, ensuring that generated artifacts meet quality, provenance, and physical‑constraint requirements.

Nexus assembles a layered architecture that is engine‑agnostic by design, allowing any knowledge‑graph, simulation, or robotic backend to be swapped in without rewriting orchestration logic. This document presents the unified blueprint — from textual document ingestion to embodied experimental validation — as a single cohesive research infrastructure.

Architecture Overview

The system is organized into five integrated layers, extended by cross‑reality capabilities that transcend the original digital scope.

Layer	Logical Component	Core Responsibility
1	Knowledge Graph Engine	Persist documents, entities, relationships, embeddings, simulation outputs, and sensory traces; provide hybrid retrieval across vector, graph, and temporal spaces.
2	Artifact Ingestion Pipeline	Decouple CI/CD, robotic workflows, and sensor pipelines from the knowledge graph; guarantee strong consistency and incremental updates via engine‑agnostic sync workers and message queues.
3	Agentic Research Loop	Decompose research questions, invoke tools (including simulators and physical instruments), ground hypotheses in multi‑modal evidence, and produce contract‑compliant reports.
4	Learning Loop	Refine the Planner on‑policy using outcome‑based rewards, novelty scores, physical reproducibility metrics, and human feedback.
5	Contract Governance	Define structural, evidential, and physical‑constraint rules for all generated artifacts; enable evolvable, machine‑readable governance across domains.

These layers implement the SSCCS Organic Growth model: contract‑governed ingestion feeds a unified knowledge graph, which drives hypothesis generation and validation, with the system continuously learning from its own discoveries — whether those discoveries occur in a document, a simulation, or a physical laboratory.

Foundational Research Paradigms

The integration of three complementary paradigms ensures that Nexus can plan, execute, evaluate, and self‑correct within a structured output framework, regardless of domain.

Concept	AgentFlow	HypoChainer	Story2Proposal
Planner	Trainable policy via on‑policy RL	–	–
Executor	Tool orchestration layer	–	–
Verifier	Termination & state evaluation	KG‑grounded validation	Evaluation feedback
Generator	–	–	Report rendering under contract
Contract	–	–	Persistent structural obligations
Memory	Trajectory storage for RL	–	–
KG Engine	–	Entity‑aligned retrieval	Visual artifact registry (analogy)

These occupy orthogonal concerns — learning, reasoning, formatting — and their fusion covers the complete research lifecycle.

Layer 1: Knowledge Graph Engine

The knowledge graph engine is a graph‑native retrieval‑augmented generation system. It decomposes incoming artifacts into entities, typed relationships, and community clusters. All data resides in a single transactional database with two extensions:

Vector index: for similarity search over document chunks, entity descriptions, and embedded sensor data.
Graph store: for entities, relationships, temporal sequences, and community structures supporting multi‑hop reasoning.

During ingestion, documents, code artifacts, simulation outputs, or sensor streams pass through chunking, LLM‑based entity/relationship extraction, gleaning, normalisation, embedding, and community detection. The engine exposes multiple retrieval strategies (naive, local, global, hybrid, mix, bypass) that the Planner selects based on question type.

Layer 2: Artifact Ingestion Pipeline

Direct uploads from CI/CD, simulators, or robotic platforms to the knowledge graph create coupling, lack change detection, and complicate multi‑source merging. Nexus decouples the pipeline:

Object Store: holds the authoritative copy of all artifacts — documentation, code symbols, simulation results, telemetry logs, video streams, hardware‑in‑the‑loop recordings. It provides strong consistency and a standard API.
Sync Worker: exposes an engine‑agnostic endpoint (/sync/:engine). It compares the current state of the object store with a persistent mapping of previously ingested items, computes a diff, and pushes small task chunks into a message queue.
Queue Consumers: execute the actual API calls on the target engine (delete, upload) and update the mapping. This avoids platform rate limits and allows auto‑scaling.

The design ensures that every change — whether a commit, a simulation completion, or a robotic demonstration — is reflected in the knowledge graph within seconds.

Layer 3: Agentic Research Loop

The multi‑turn loop orchestrates research activity:

Planner (the only trainable component): decomposes a research question, selects appropriate retrieval or simulation modes, decides when to invoke physical instruments, and determines evidence sufficiency.
Executor: invokes chosen tools and records structured results.
Verifier: grounds each hypothesis step against the knowledge graph, computes support and novelty scores, checks compliance with the governance contract, and optionally applies physical‑consistency constraints.
Generator: produces a hypothesis chain diagram, evidence table, gap analysis, and a structured report. A Reporter agent cross‑checks the report against the contract before finalisation.

All actions are recorded in an append‑only Evolving Memory, which serves as the raw material for reinforcement learning.

Layer 4: Learning Loop

Collected research trajectories feed an on‑policy reinforcement learning pipeline (Flow‑GRPO):

Rollout collection: each session is stored as a structured log.
Reward computation: blends knowledge‑graph support, novelty, contract compliance, physical reproducibility, and optional human feedback.
Group sampling: trajectories are batched for group‑normalised advantages.
Policy update: the Planner is updated using a clipped objective with KL penalty toward a reference model; the final reward is broadcast to all steps, converting multi‑turn credit assignment into single‑turn updates.

Over time, the Planner internalises which strategies produce well‑grounded, innovative, and physically reproducible results.

Layer 5: Contract Governance

The contract is a machine‑readable specification that defines:

Required hypothesis steps and evidence thresholds.
Novelty minimums (previously unseen graph edges a hypothesis must propose).
Mandatory sections and citation formats for generated reports.
Physical constraints: measurement precision bounds, reproducibility requirements, safety invariants.

Because the contract is versioned and enforced at both the Verifier and Reporter layers, it provides evolvable governance that can later incorporate cryptographic provenance, code‑documentation‑hardware traceability, and formal verification — all without architectural changes.

Extension: Boundaryless Research Infrastructure

The Need for Physical‑Digital Unification

Fundamental computing research on the scale of redefining the von Neumann paradigm cannot remain confined to text and code. Structural observation, as a new computational primitive, must be validated not only through compiler experiments and emulations but also through robotic validation, physics simulation, digital twins, and embodied experimentation. The same Segment, Scheme, Field, and Observation primitives that describe compiler behavior must also describe robot motion trajectories, circuit simulation states, or computational fluid dynamics outputs.

The infrastructure must, therefore, evolve from a document‑code knowledge graph into a cross‑reality research manifold — a unified latent space where theoretical insights, simulation outputs, and physical measurements inhabit the same queryable, verifiable structure.

Mathematical Foundation: Universal Latent Homeomorphic Manifold (ULHM)

Recent work establishes homeomorphism — a continuous bijection preserving topological structure — as the criterion for determining when fundamentally different representation pathways share compatible latent structure. Two modalities that capture the same underlying reality, however differently encoded, can be rigorously unified when their latent manifolds are homeomorphic.

This provides the theoretical backbone for Nexus’s boundaryless extension. The same SSCCS primitives that describe compiler behavior can, through a verified homeomorphic mapping, describe robotic motion or hardware telemetry. The mathematics guarantees that reasoning across these domains is structurally valid, not merely heuristic.

The ULHM framework introduces three canonical loss terms applicable to any homeomorphic mapping task:

Continuity loss: ensures that small changes in one modality correspond to small changes in the other.
Trust loss: preserves neighborhood relationships across modalities.
Wasserstein loss: aligns the global distributions of the latent representations.

These losses can be incorporated into Nexus’s Verifier as contract rules, automatically validating that a physical measurement and a semantic claim share compatible structure before a hypothesis is accepted.

Existing Cross‑Reality Systems Validate the Approach

Multiple systems have already demonstrated that unified representation across digital and physical domains is deployable:

FermiLink: operates across approximately fifty scientific software packages spanning nine research domains, using a single agent framework. Its separation of package‑specific knowledge from simulation workflows allows the same reasoning engine to orchestrate full‑paper‑level research across computational domains.
SCP (Science Context Protocol): bridges computational and physical laboratories through a universal specification for describing and invoking scientific resources — including software tools, models, datasets, and physical instruments. It manages the complete experiment lifecycle.
MomaGraph: unifies spatial, functional, and task‑oriented relationships into a single scene graph for embodied agents, supporting zero‑shot task planning.
EmbodiedLGR: demonstrates that hybrid graph‑based memory — combining low‑level spatial‑semantic graphs with high‑level retrieval‑augmented descriptions — can run locally on physical robots.
PhyGeo‑KG: introduces physics‑regularized knowledge graph construction, where physical laws act as constraints on graph edge formation.

Extended Architecture

Nexus’s existing layered architecture was designed for exactly this extensibility. The extension to physical‑digital domains is not a redesign but a natural expansion:

Layer	Current Scope	Extended Scope
Knowledge Graph Engine	Documents, code symbols, external references	Simulation outputs, robot trajectories, sensor streams, digital twin state, experimental measurements
Artifact Ingestion Pipeline	Text files (`.md`, `.qmd`, `.html`, `.rs`)	Binary simulation results, point clouds, telemetry logs, video streams, hardware‑in‑the‑loop data
Agentic Research Loop	Hypothesis chains from document‑code gaps	Hypothesis chains spanning simulation predictions, physical measurements, and theoretical claims
Learning Loop	Planner optimized on research session outcomes	Planner optimized on experimental validation rates, simulation fidelity, and physical reproducibility
Contract Governance	Structural and citation rules	Physical constraints, measurement precision bounds, reproducibility requirements, safety invariants

The key enabler is the existing /sync/:engine pattern, the EngineHandler interface, and the queue‑based incremental sync. Nothing in the pipeline assumes that artifacts are text. An EngineHandler for a physics simulation backend follows the same interface as one for a document store. The “document” may be a simulation configuration, a robotic demonstration log, or a sensor calibration record — the protocol is identical.

The Episodic Knowledge Graph: Memory Across Realities

The Episodic Knowledge Graph (eKG) acts as a long‑term symbolic memory for embodied agents. An event bus collects multimodal signals (vision, language, sensor readings, action outcomes) and posts interpretations as temporal sequences. The eKG aggregates and connects these interpretations, establishing coherence across interactions that span different modalities, agents, and timescales.

For Nexus, this means the Evolving Memory that currently records Planner‑Executor‑Verifier trajectories evolves into an episodic graph that also records physical experimental outcomes. When a hypothesis about compiler behavior is validated through RISC‑V emulation, and that same hypothesis is later tested on a physical robot, both validations reside in the same eKG, connected by the shared conceptual structure they verify.

Unified Latent Representation: The Homeomorphic Bridge

When our Observation primitive is described semantically in a whitepaper and simultaneously encoded in a sensor trace from a hardware validation, these two representations induce latent manifolds. If those manifolds are homeomorphic — if they share the same underlying topological structure — then:

Semantic‑guided recovery is possible: a partial physical observation can be completed using knowledge from the whitepaper’s formal description.
Cross‑domain transfer is verified: a hypothesis validated in simulation can be rigorously transferred to physical hardware.
Zero‑shot compositional reasoning becomes possible: new experimental designs can be synthesized by composing semantic descriptions in ways guaranteed to have valid physical realizations.

These capabilities have been empirically validated on cross‑domain classifier transfer and zero‑shot classification tasks. The same principles apply to transferring knowledge between compiler optimization traces and hardware performance measurements, or between formal specification proofs and physical circuit behavior.

Toward Continuous Research Manifolds

The vision is of Nexus as a continuous research manifold: a unified latent space where a theoretical insight about Field transition dynamics, a compiler pass that optimizes for that dynamics, a simulation of the compiler running on RISC‑V emulation, a robot experiment validating the energy efficiency claims, a sensor stream from a hardware implementation, and a maintenance log from a deployed system all inhabit the same queryable structure. A researcher can ask: “Show me all physical validations of hypotheses derived from Whitepaper §3.4, grouped by simulation fidelity and hardware platform.” The system traverses from document entities to simulation outputs to robot logs to sensor traces — because they are all connected in the same graph, grounded by the same primitives, verified by the same contract.

What Must Be Built

Three concrete additions to the existing Nexus architecture realize this boundaryless extension:

Multi‑Modal Ingestion Handlers. New EngineHandler implementations for physics simulation frameworks, robotic platforms, and sensor pipelines. Each presents the same interface but maps to domain‑specific storage and retrieval protocols.
Homeomorphic Verification Layer. An extension to the Verifier that applies continuity, trust, and distributional distance metrics to determine when a physical observation and a semantic claim share compatible latent structure. This becomes part of the Contract: a hypothesis step is only “verified” when the homeomorphism criterion is satisfied.
Episodic Knowledge Graph Integration. The Evolving Memory evolves from append‑only JSONL trajectories to a true eKG that preserves temporal ordering, agent provenance, and cross‑modal coherence. This enables the Planner to reason about when, by whom, and under what conditions a discovery was made — essential for reproducibility in physical experiments.

Component Interaction Matrix

Component	KG Engine	Object Store	Sync Worker	Planner	Verifier	Generator	Simulation / Hardware
KG Engine	–	–	← synced by	← queried by	← grounds	–	← ingests traces
Object Store	–	–	← read during diff	–	–	–	← uploaded by sim/robot
Sync Worker	→ delete/upload	→ list/read	–	–	–	–	–
Planner	→ queries	–	–	–	→ delegates	–	→ invokes sim/robot
Verifier	→ hybrid queries + homeomorphic checks	–	–	← receives	–	→ signals	← validates physical results
Generator	–	–	–	–	← triggered	–	–

Strategic Alignment

Engine‑agnosticism: the synchronization endpoint and engine handler interface isolate the rest of the system from any particular backend, enabling future knowledge‑graph, simulation, or robotic algorithms to be adopted without disruption.
No lock‑in: every component is replaceable with an open equivalent — the object store, the message queue, the key‑value mapping, the knowledge graph database, the simulation engine, and the robotic platform.
Research‑first design: the entire pipeline is optimized for the academic exploration cycle (hypothesise → validate → publish) across both digital and physical domains.
Boundaryless by architecture, not by patch: the extension from document‑code to physical‑digital is a natural consequence of the engine‑agnostic patterns already built into the core design. No fundamental rewrite is required.