Other Formats
Why neXus Outscales Centralized Approaches
Linear scaling, blob-native storage, and the LLM-as-magnifying-glass principle
Abstract
neXus is not an LLM orchestration framework. It is an accumulative knowledge architecture where every LLM interaction produces an immutable Fact, every Fact is stored in blob-native Parquet format on distributed object storage, and the OODA loop runs continuously on the accumulated graph. LLMs become a minority component – called only when a new knowledge branch must be opened. The result is a system that scales linearly with node count, whose knowledge base grows monotonically, and whose backup strategy is identical to video file archival.
The architecture is storage-agnostic by design: the same Fact graph, same cursor, and same FIH primitives operate identically across local databases, remote object stores, and blockchain backends. Storage is a pluggable trait implementation – the core never changes.
As knowledge accumulates, its value compounds. Every verified Fact deepens the cross-validation network, making the graph increasingly costly to replicate or forge – analogous to Bitcoin’s increasing difficulty, but driven by knowledge depth rather than hash power. The verified knowledge graph itself becomes the bedrock of a research economy where contributions are provable, auditable, and permanently attributable.
The Centralized Ceiling
Every major AI lab today operates a variant of the same architecture:
Data → Preprocess → Train → Evaluate → Deploy
(serial pipeline, each stage blocks on the previous)
This is a serial, centralized pipeline for training. It cannot scale linearly for that specific stage because the training loop itself is a sequential optimization process. Other stages (data ingestion, evaluation, deployment) can be parallelised more effectively, but the overall research cycle is gated by the training step.
The industry response to this ceiling has been to build larger models and larger clusters. This approach faces economic pressure: the training cost of a single frontier model is now estimated in the hundreds of millions of dollars, and the marginal return per additional accelerator is declining. The operational cost of the infrastructure required to run such a pipeline — data centres, SRE teams, database licenses — adds another layer of expense that grows with the organisation, not with the research output.
The Accumulative Alternative
neXus inverts the architecture. Instead of moving data through a serial pipeline, it treats research as an ongoing accumulation of structured knowledge:
The key properties:
Everything is a Fact. Document chunks, experimental results, peer reviews, sensor readings, simulation outputs – all become Facts on the Blackboard. There is no distinction between “data” and “metadata.”
Facts are immutable. Once committed, a Fact never changes. New evidence adds new Facts; it never modifies old ones. This makes the entire knowledge base auditable, replayable, and forkable.
Facts are blob-native. Each Fact is serialized as a row in a Parquet file. Parquet files sit on an object store. The entire knowledge base is a directory tree of Parquet files. Backup = incremental sync to a secondary store.
The OODA loop never stops. Observe (read_state) -> Orient (gap detection) -> Decide (submit Intent) -> Act (conclude Intent). Each cycle produces new Facts. Core operations (lookup, insert, query) are sub-millisecond; graph traversal time depends on graph size. The loop runs continuously across all nodes.
The LLM-as-Magnifying-Glass Principle
LLMs are the most expensive component in any AI system, both in latency and in cost. neXus minimises LLM usage to the precise moments when they add value:
LLM called when:
- A new knowledge branch must be opened (gap too large to fill by graph traversal)
- Raw document text must be parsed into structured Facts
- A contradiction between existing Facts must be resolved
LLM NOT called when:
- Querying existing Facts (Cypher, µs)
- Traversing the knowledge graph (petgraph, µs)
- Detecting gaps between connected Facts (graph algorithm, µs)
- Generating routine reports (template + Fact interpolation, µs)
- Cross-referencing Facts across documents (hash lookup, ns)
As the knowledge base grows, an increasing fraction of queries can be answered from existing Facts alone. The LLM is reserved for the minority of cases where accumulated knowledge is insufficient — opening new branches, parsing raw text, or resolving contradictions between established Facts.
Linear Scaling by Construction
Because nodes do not communicate directly (Stigmergy: they coordinate through the shared Blackboard), adding nodes does not increase coordination overhead:
1 node: 1 OODA loop, 100 Facts/hour, 1KB/hour to R2
10 nodes: 10 OODA loops, 1000 Facts/hour, 10KB/hour to R2 (no contention)
100 nodes: 100 OODA loops, 10000 Facts/hour, 100KB/hour to R2 (no contention)
The FlushCursor ensures each node writes only its own增量. There is no global lock, no distributed transaction, no consensus protocol. Each node’s cursor is independent.
This is possible because:
- Facts are content-addressed (FihHash). Two nodes that discover the same fact produce the same hash. The fact is stored once. Duplicate writes are idempotent.
- Intents are claimed exclusively (claim_intent). Only one node can work on a given Intent. But the Intent itself was submitted by another node (gap detector). The claim mechanism is the only serialisation point, and it operates on a single Mutex per Blackboard. When the Blackboard is distributed (KV-based), the claim becomes an atomic KV operation.
- Hints constrain behaviour without serialising it. A Hint is a global rule that all nodes read. It is never modified by nodes. No contention.
Blob-Native Backup
Traditional database backup:
100 GB PostgreSQL → pg_dump → 45 minutes → compressed dump
Restore: create database → pg_restore → 45 minutes → verify indexes → 15 minutes
WAL archive management: continuous, complex retention policies
neXus backup: an object store holding the same Parquet files can be incrementally synced. Restore is a file-level copy operation. There is no WAL, no retention policy management, and no schema migration step between versions.
Every Parquet file is self-describing (schema embedded in the file footer). A Parquet file written in 2026 will be readable in 2036 by any Parquet-compliant reader. There is no version lock-in, no migration path, no vendor dependency.
Comparison with Industry Approaches
| Dimension | neXus | Google DeepMind | OpenAI | Microsoft Research |
|---|---|---|---|---|
| Coordination | Stigmergy (indirect) | Central scheduler | Central orchestrator | Central pipeline |
| Storage | Blob-native (R2/S3) | Proprietary DB | Proprietary DB | SQL + Blob hybrid |
| Scaling | Linear (add nodes) | Sub-linear (cluster size) | Sub-linear (cluster size) | Sub-linear (pipeline width) |
| LLM cost | Called only for new branches | Heavily used in pipeline | Heavily used in pipeline | Heavily used in pipeline |
| Backup | Incremental sync to object store | WAL + replication | WAL + replication | WAL + replication |
| Determinism | Full replay from cursor | None | None | Partial |
| WASM target | Yes (primary) | No | No | No |
Heterogeneous Storage: Same Graph, Any Backend
The architecture is storage-agnostic by design. The Storage trait defines the interface; any backend that implements it speaks the same FIH protocol, writes the same Fact format, and produces the same FlushCursor stream.
FlushCursor is the universal cursor. Whether stored in SQLite, KV, R2 metadata, or a blockchain transaction, the format is identical:
{
"last_flushed_at": "1748266190",
"partition": "project-alpha"
}This means:
- A developer working on a local database can hand their cursor to an edge deployment.
- The edge node reads from fast key-value storage, writes to object storage, advances the cursor.
- A governance contract on-chain verifies the cursor chain without re-executing.
- All three share the same backup: incremental sync to a secondary object store.
The storage choice is a deployment detail, not an architectural decision.
The Verified Knowledge Graph as Economic Bedrock
The parallel with Bitcoin is structural, not metaphorical.
| Bitcoin | neXus |
|---|---|
| Block = bundle of transactions | FlushResult = bundle of verified Facts |
| Block hash = chain anchor | FihHash = content-addressed Fact ID |
| Difficulty = hash work required | Depth = cross-validation chains required |
| 51% attack = rewrite history | Fact forgery = rebuild all dependent validations |
| Halving = block reward decreases | Accumulation = each Fact increases graph value |
As the Fact graph grows, the cost of forging or reverting a Fact grows proportionally to the number of dependent validations. A Fact that sits at the root of 10,000 subsequent Facts cannot be altered without rebuilding the entire subgraph – a cost that becomes prohibitive long before the graph reaches 1M Facts.
This creates the economic conditions for a research economy:
- Provenance: Every Fact is permanently linked to its origin Intent and creator.
- Attribution: Derivative Facts implicitly cite their ancestors via from_facts.
- Auditability: The entire trajectory from gap detection to validated result is traced through Fact-Intent-Fact chains.
- Scarcity by depth: A Fact deeply embedded in the validation network is more valuable than a surface Fact, because it is harder to replicate and has more supporting evidence.
In this economy, tokens are not mined – they are earned by contributing to the depth and breadth of the verified knowledge graph.
Conclusion
neXus competes on cost efficiency per unit of accumulated knowledge. A system that stores every Fact, never forgets, and runs OODA loops in the common case without LLM calls will, over time, accumulate a knowledge graph that grows deeper with each validated result. The cost of maintaining that graph does not grow with its size — only the storage cost of additional Parquet files. At $5 per worker per month, a thousand-node research deployment costs less than a single engineer. No existing approach — academic, industrial, or governmental — can match this ratio of operational cost to knowledge accumulation rate.
The architecture is simple not because it is naive, but because complexity has been moved from the runtime (coordination protocols, distributed transactions, consensus) into the data model (content-addressed Facts, append-only log, cursor-based replay). This is the same insight that made Git successful: make the data model simple and powerful, and the runtime becomes an implementation detail.