← Insights
SYS_LINK: ACTIVE// KINETIC_ENG

Self-Governing Codebases: Bridging the Code-Architecture Gap with Governed RLMs

Peter HanssensPeter Hanssens
Self-Governing Codebases: Bridging the Code-Architecture Gap with Governed RLMs

Bridging the Code-Architecture Gap with Governed Recursive Language Models (RLMs)

Introduction: The Tragedy of Document Decay

In software engineering, documentation is a promise that is rarely kept.

We start projects with clean architectures, rigid domain vocabularies (Ubiquitous Language), and elegant boundaries. But code is fluid. In a fast-moving monorepo, files are refactored, modules are renamed, and new routes are deployed daily.

Within weeks, a gap opens. The architectural diagrams say one thing, the ADRs (Architecture Decision Records) say another, and the physical implementation is entirely different. This is documentation decay, and it costs enterprise organizations millions in onboarding friction, duplicate work, and architectural drift.

To solve this, we built the Governed RLM Ontology—a platform that turns your codebase into a self-documenting, self-validating knowledge graph.


🧠 The Core Concepts: RLM meets Ontology

Before exploring the architecture, we must define the two fundamental pillars of this solution and understand why combining them unlocks a self-governing codebase:

1. What is an Ontology?

In computer science and data engineering, an Ontology is a strongly-typed, formal model of a domain's entities, properties, and the relationships between them. Unlike a simple database schema, an ontology records governed semantic reality. In our system, the ontology represents the authoritative map of your software architecture—defining first-class entities (Bead, Task, Plan, Review, Term) and enforcing strict, acyclic relational rules (implemented_in, depends_on, governed_by).

An ontology is the rules of what makes code valid and how components connect.

2. What is an RLM (Recursive Language Model)?

A Recursive Language Model (RLM) is an autonomous AI agent loop equipped with sandboxed tools. Rather than attempting to guess or refactor code in a single "one-shot" prediction, an RLM writes small, modular Go snippets and executes them dynamically inside an isolated interpreter. It reviews compiler compiler errors, queries filesystem boundaries, and self-corrects its logic iteratively until its mutations satisfy all compilation and design policies.

An RLM is the autonomous brain that inspects the codebase and executes changes.

3. Why Combine the Two? (The Synergistic Solution)

If you only have an Ontology, you have a static map. Humans have to manually write JSON files to keep the map updated, which they will inevitably forget to do—resulting in document decay.

If you only have an RLM, you have a blind agent. The AI can write code, but it lacks a structured vocabulary, architectural policies, or relationship maps to understand why it is editing a file or how its changes affect other packages.

By combining them, we create a Self-Governing Codebase:

  • The Ontology provides the RLM with an authoritative, lightweight directory of the codebase's semantic reality (e.g. Term:post-registry is implemented in generate-posts-data.mjs).
  • The RLM acts as the curator that walks the codebase, automatically extracts code lineages, writes changes, and feeds updates back into the ontology append-only log.
  • The local validator checks the RLM's work against the ontology's active policies (e.g. "every plan must have a pre-landing review"), instantly blocking PRs that drift from the architecture.

What is a Recursive Language Model (RLM)? (ELI5 Explainer)

[!TIP]

👶 Explain It Like I'm 5 (ELI5)

Imagine you have a giant, messy box of LEGOs, and you want to build a beautiful castle.

A normal AI is like a builder who stands far away, looks at the box once, and tries to draw a picture of a castle from memory. Usually, they get some parts right, but they make mistakes because they can't touch the LEGOs or check if they fit together.

A Recursive Language Model (RLM) is like a smart little robot builder who jumps right into the box!

The robot can pick up a brick, try to connect it to another, and see if it snaps together. If a brick doesn't fit, the robot doesn't give up—it throws it away, grabs a different one, and tries again. It talks to itself, checks its instructions, and keeps building piece-by-piece until it has built a perfect castle.

In our system, the "box of LEGOs" is your codebase, the "bricks" are your lines of code, and the "perfect castle" is a beautiful, working map of your software architecture.


The Power of RLM: Moving from "One-Shot" to "Closed-Loop"

Traditional AI code analysis suffers from two fatal flaws:

  1. Context Exhaustion: Massive codebases exceed prompt token windows. Loading 5MB of source code into a prompt leads to high costs and hallucinations.
  2. Open-Loop Execution: The model writes a code snippet or generates an analysis but has no way of knowing if its code compiles, if its assumptions are valid, or if its outputs conform to policies.

An RLM solves this by working in a closed execution loop:

graph TD
    Start[1. Kick-off Loop] --> Ingest[2. Git Delta Ingest: 99% Smaller Context]
    Ingest --> Interpreter[3. Sandboxed Go/Yaegi VM]
    Interpreter -->|Writes & Runs bare Go statements| Mutate[4. Governed Database: JSONL + SQLite]
    Mutate --> Validate[5. Policy Validation & fsck Check]
    Validate -->|Compiler Error / Validation Feedback| Interpreter
    Validate -->|Success & Termination Sentinel| Exit[6. FINAL_ONTOLOGY Exit]

By allowing the AI orchestrator to write Go code executed dynamically inside a safe micro-interpreter, the AI can call helper functions (ListFiles, ParseFileSymbols, FindReferences) to explore the codebase incrementally. It only reads full file contents when absolutely necessary, preserving context space and slashing API costs.


Production Hardening: Safety at Scale

Running model-generated code dynamically is highly powerful but introduces architectural risks. Here is how we hardened the Drover ontology loop for production-readiness:

1. Unsafe Package Stripping

We decoupled Yaegi execution from the host shell. Standard library packages that permit arbitrary disk or network I/O (os, os/exec, syscall, and net/* subpackages) are stripped from the VM runtime catalog. The model can only interact with mathematical libraries, string parsers, and explicit, bounded Drover* API mutation hooks.

2. Local Zero-Token AST Syncing

Lineage trackers require recording the exact line numbers (line_start, line_end) of code structures. When developers refactor files locally, these lines shift.

To avoid the token cost of constantly querying an LLM to update simple line ranges, we deployed a local, AST-aware JavaScript script (sync-ast-lineages.js) and a Git pre-commit hook (pre-commit.sh).

The hook intercepts commits, parses physical file contents to detect symbol shifting, and appends update operations to the append-only graph.jsonl log in less than 0.2 seconds for a cost of $0.00.

3. Continuous Integration Policy Gates

Documentation is only valuable if it is enforced. We built a reusable composite GitHub Action CI check. Every time a developer opens a Pull Request, the validator tests the graph constraints:

  • Are all new Term objects governed by a curation role (governed_by)?
  • Does the taxonomy hierarchy remain acyclic (broader_term)?
  • Do approved plans have the required reviews?

If any constraint is violated, the check fails, PR annotations are raised, and merging is blocked—preventing architectural decay before code ever reaches production.


Conclusion: Codebases that Document Themselves

By combining Recursive Language Models (RLMs) with governed relational databases, isolated VMs, and zero-token local AST sync hooks, the Drover Ontology provides a blueprint for modern enterprise development.

Developers get to focus entirely on writing high-quality code. The system takes care of mapping symbols, enforcing boundaries, synchronizing lineages, and ensuring that your architecture maps always reflect codebase reality.

The gap between code and documentation is finally closed.

RELATED_NODES

NODE_CHAIN // SIG_FAST

← All articles

Cloud Shuttle Insights