PROTOCOL_ID: RLM_CORE_INGESTION_V1

Self-Governing Codebases

AUTHOR: Peter Hanssens

2 June 2026

INGESTION_ROUTING: ACTIVE

In modern engineering, documentation is a promise that is rarely kept. Architectural models and wiki pages look pristine on Day One, but the moment developers merge the first pull request, physical code paths shift, signatures change, and the map decays. Here is how we solved document drift by engineering a self-documenting system driven by **Recursive Language Models (RLMs)**.

SYSTEM_PILLARS // CONCEPTUAL_MODEL

The Three Pillars of Self-Governance

PILLAR_01

The Ontology

A strongly-typed, formal model representing the codebase's entities, properties, and relationships. It enforces semantic rules (e.g. which files implement which concepts), serving as the authoritative ruleset of your software architecture.

// Rules & Architecture Map

PILLAR_02

The RLM

An autonomous, recursive agent loop executing dynamically inside a sandboxed VM. Instead of making blind, one-shot predictions, the RLM writes modular inspection scripts to explore files and auto-correct errors based on compiler feedback.

// Autonomous Inspection Brain

SYNERGY

Combined Synergy

Ontology feeds the RLM its structural reality, and the RLM acts as the automated curator. Validations check code against active policies, and local, zero-token AST sync hooks instantly record file line movements at zero cost.

// Self-Governing Codebase

EXPLAINER: TECHNICAL_INGRESS

The Power of Recursive Language Models (RLMs)

Traditional codebase ingestion strategies scale poorly. Ingesting an entire repository of raw source files into an LLM window quickly exhausts context budgets, incurs high API costs, and limits structural reasoning.

A Recursive Language Model (RLM) changes this paradigm by running a closed execution loop inside an isolated environment. Instead of predicting everything in a single, blind shot, the RLM harness executes bare Go statements dynamically inside a sandboxed interpreter (Yaegi). It uses system code-analysis helpers (like ListFiles(), ParseFileSymbols(), and FindReferences()) to investigate the filesystem iteratively and construct its database step-by-step.

Closed Ingestion Execution Loop:

The orchestrator calls recursive sub-queries using Query() for complex tasks, executes edits, matches schemas, runs validations (DroverFsck()), and terminates cleanly only when the validation returns zero violations.

🔒 Secure Sandboxing & Execution Safety

Executing code generated dynamically by an AI can introduce host vulnerabilities. To make this production-ready, we implemented a layered execution sandbox:

Unsafe Package Stripping: Standard packages that allow direct access to disk or host environments (os, os/exec, syscall, and net/* network calls) are completely removed from the Yaegi standard catalog. The AI can only call sandboxed data mutation helpers.
Deterministic Timeouts: Every VM call and query is capped via strict timeouts to guarantee the execution loop never hangs or blocks indefinitely.
Delta Ingestion Mode: In Delta Mode, the system runs a local git status --porcelain scanner, loading *only* modified or newly introduced file comparison blocks. This reduces prompt footprints by **99%** (down to 61 KB) and saves massive API costs.

💰 Zero-Token AST Synchronization

AI calls are expensive. We don't query an LLM when developer code simply shifts around. We engineered a lightweight, local JS syncing utility (sync-ast-lineages.js) and a Git pre-commit hook that parses symbols locally, matches modified signatures, and automatically synchronizes codebase lineages inside graph.jsonl in under 0.2 seconds—costing absolutely nothing.

🤖 Continuous Integration Policy Gates

Documentation is only valuable if it is strictly enforced. We built a reusable composite GitHub Action CI check. Every time a developer opens a Pull Request, the validator tests the graph constraints:

Are all new Term objects governed by a curation role (governed_by)?
Does the taxonomy hierarchy remain acyclic (broader_term)?
Do approved plans have the required reviews mapped in the SQLite projection?

If any constraint is violated, the check fails, PR annotations are raised, and merging is blocked—preventing architectural decay before code ever reaches production.

SYSTEM_BOOTSTRAP_ACTION

Deploy Governed Ingestion Loops

Ready to eliminate codebase drift and enforce architectural policies at scale? Deploy the local visualizer and deep-link your design models directly into VS Code or Cursor natively.

BOOK_FREE_CONSULTATION