Toward a New Science of Self-Referential Systems

Civilization is building systems that reason about themselves, audit themselves, and govern themselves — without a formal science of what self-referential systems can and cannot do. That gap is not merely academic. It is costing us clarity about AI safety, interpretability, consciousness, and the foundations of physics. Here is the case for closing it, and what the first results look like.

This article introduces the Reflexive Reality research program. Full research index ↗ · All explanatory essays ↗

Key results referenced here: No AI Can Fully Verify Itself · Closure Without Exhaustion · One Theorem Behind Gödel, Turing, Kleene, Tarski, and Löb

The Scientific Gap in Today’s Discourse

There is a class of systems that keeps appearing at the center of the most important questions of our time. These systems have one defining property: they contain models of themselves. They do not just process inputs and produce outputs — they represent their own structure, reason about their own behavior, and in some cases update themselves based on what they find.

The human mind is such a system. The physical universe, if it is genuinely closed — if it has no outside — is such a system. Any AI architecture sophisticated enough to reason about its own reasoning is such a system.

These are called self-referential systems. And here is the gap: despite their centrality to physics, to the study of mind, and now to artificial intelligence, there is no mature formal science of them.

There are fragments — individual results in logic and computability theory, philosophical arguments, engineering intuitions. But no unified framework that says systematically what such systems can and cannot do, what limits apply to all of them, and what capabilities are structurally achievable.

That gap is generating real confusion in real debates right now.

When AI researchers debate whether a model can be “fully interpretable,” they are debating a property of self-referential systems — without the formal tools to say what full interpretability would even require, or whether it is achievable in principle.

When AI safety teams design governance systems in which AI models evaluate other AI models, they are implicitly assuming properties of self-referential systems that have not been established.

When philosophers debate the “hard problem of consciousness,” they are arguing about a self-referential system — the mind — without a formal theory of what such systems can and cannot represent about themselves.

When physicists propose theories of everything, they are implicitly claiming properties about a self-referential system — a universe that contains its own description — that have never been formally analyzed.

The discourse in all of these areas has been dominated by intuition, analogy, and philosophical argument. These are not substitutes for formal results. The field needs theorems.

What a Science of Self-Referential Systems Requires

A formal science of self-referential systems needs to do several things.

It needs to define precisely what such a system is — not just informally, but in a way that connects to existing mathematics. It needs to identify the structural properties that all such systems share, regardless of their substrate. It needs to derive theorems about those properties that are machine-checkable — not just plausible arguments, but proofs that can be audited. And it needs to apply those theorems to the concrete cases we actually care about: minds, universes, AI systems, institutions, physical theories.

This is what I have spent years building. The program is called Reflexive Reality. Its technical spine is a suite of papers and Lean 4 proof libraries under the heading NEMS — No External Model Selection.

The name refers to the foundational constraint the program begins from: any system that is genuinely self-contained cannot import its own selection criteria from outside itself. Everything load-bearing must come from within.

That constraint, taken seriously and developed formally, turns out to have consequences that reach across physics, logic, AI, consciousness, and the foundations of mathematics.

The Classical Results Are Fragments of One Theorem

The first major result is a unification — and it reframes how we should think about everything that came before.

Every educated person in a technical field has encountered some version of the following results:

Gödel’s incompleteness theorems — there are true mathematical statements that cannot be proved within any sufficiently powerful formal system.
Turing’s halting undecidability — no algorithm can decide for every possible program whether it will halt.
Tarski’s truth undefinability — no sufficiently expressive language can contain its own truth predicate.
Kleene’s recursion theorem — every computable function has a fixed-point program that maps to itself under that function.
Löb’s theorem — a formal system can prove “if P is provable then P is true” only for sentences it can already prove outright.

These are taught as separate results in separate courses. They are said to be “related” — to share a “diagonalization technique.” But the relationship remains informal. Students learn five separate theorems that happen to rhyme.

I proved that they are not five separate theorems. They are five instances of one theorem.

The Master Fixed-Point Theorem (machine-checked in Lean 4, zero custom axioms) provides a single abstract interface — a minimal specification of what it means for a system to be self-referential in the relevant sense. Every one of the classical results is a specialization of this interface to a different domain.

Gödel’s incompleteness is what you get when you instantiate the interface with formal provability in arithmetic. Turing’s undecidability is what you get with computable functions. Tarski’s result is what you get with syntactic truth definition. Kleene’s theorem is the constructive half of the same structure. Löb’s theorem falls out of the provability instantiation as a specific constraint.

The classical results are not analogies. They are special cases, with machine-checked derivations showing exactly how each one follows from the common structure.

This matters because it means I can now generalize. The classical results were each proved for a specific domain. The master theorem proves the same structural impossibility for any system meeting the interface conditions — which turns out to be a much broader class than arithmetic or Turing machines. It covers minds. It covers universes. It covers AI systems. And it grounds the flagship theorem of the whole program.

Closure Without Exhaustion: The Flagship Result

Running through almost every domain of modern inquiry is a background assumption about self-referential systems. It goes something like this: a sufficiently powerful self-referential system could, in principle, achieve total self-description.

A theory of everything that fully describes the universe from within. A mind that achieves total self-transparency. A formal system that decides all truths in its domain. A model that can completely characterize its own behavior.

The flagship theorem of the program — Closure Without Exhaustion — proves that this assumption is wrong for any sufficiently expressive self-referential system.

More precisely: every system that is closed (self-contained, no outside) and expressive enough to model itself generates structure that it cannot fully capture in any internal self-representation. There is always an inexhaustible remainder — content that is realized in the system but lies beyond any fixed internal account.

This is not a failure of any particular representation. It is a structural property of the system itself. Adding more representational power does not close the gap — it creates new self-referential facts that the expanded representation also cannot fully capture.

The theorem covers all five classical results as special cases. Gödel’s incompleteness is what the remainder looks like in formal arithmetic. Turing’s undecidability is what it looks like in computation. The new theorem lifts both to the general case of any closed self-referential system — not just formal systems or Turing machines, but any realized system expressive enough to model itself.

The implications are specific and non-obvious.

For physics: A theory of everything — a physical theory that correctly specifies all the laws — is achievable in principle. But a complete internal semantic account of all the universe’s record-truth is structurally forbidden. The universe cannot contain a complete description of itself. This is a theorem, not a philosophical position.

For cognitive science: Self-knowledge is real, valuable, and can go very deep. But total self-transparency — a mind that fully coincides with its own self-representation — is structurally impossible for any mind rich enough to generate self-referential thoughts. The remainder is not ignorance or failure. It is the signature of what it means to be a sufficiently expressive reflexive system.

For AI: Any AI system expressive enough to reason about its own reasoning is subject to the same structural limit. This is not an engineering limitation that better models will overcome. It is a theorem about the class.

The AI Safety Result People Should Know

The most immediately practical consequence concerns AI safety and governance.

Every serious approach to AI safety eventually requires a system to evaluate itself — to check its own alignment, verify its own reasoning, audit its own behavior. As AI systems become more capable, this requirement becomes more pressing. Surely, the thinking goes, a sufficiently advanced AI should be able to give a reliable account of what it is doing and why.

What does “reliable account of its own behavior” actually mean? It means: for any significant property of the system’s behavior — does it have this property or not? — the system can tell us correctly. This is what “full interpretability” means if the words are to mean anything precise. It is also exactly what is required for a system to serve as its own complete auditor.

The formal theorem establishes that no sufficiently expressive AI system can do this completely. Not because today’s models aren’t capable enough. Because the ability to produce a total, correct, complete account of all nontrivial properties of one’s own behavior would require exactly the kind of exhaustive self-model that Closure Without Exhaustion proves is structurally unavailable.

Scaling doesn’t fix it. Better architecture doesn’t fix it. It is a theorem about the class of systems, not a property of any particular implementation.

This has direct consequences for governance design. Any AI safety architecture that converges to a single system auditing itself — or AI systems auditing each other within the same representational class — is claiming a property that is formally impossible.

The architecture needs to be redesigned around this fact: external verification, diverse certification roles, stratified partial auditing. Not because these are nice ideas but because they are the only approaches consistent with what self-referential systems can actually do.

The full argument: No AI Can Fully Verify Itself — The Formal Proof.

What Else the Program Establishes

The program extends well beyond the incompleteness and self-certification results. Here is a partial map of what has been formally established.

On physics

The Standard Model gauge group — SU(3)×SU(2)×U(1) — is the unique survivor of the closure constraint applied to four-dimensional renormalizable gauge theory. The Born rule is the unique probability assignment consistent with a self-contained universe. The arrow of time follows from the structural requirement that records are stable and cannot be overwritten.

These are derived as theorems from the single premise of perfect self-containment — the first derivations of these specific physical structures from a foundational logical principle, machine-checked in Lean 4.

On consciousness and mind

The program establishes formal necessary conditions for genuine awareness — not as philosophy, but as structural requirements derived from the self-referential systems framework.

It proves that awareness is not an object in the world (the locus of manifestation is structurally different from the objects that appear within it), that qualia are on-ledger content that cannot be explained away, and that a necessary ontological ground — neither nothing nor a personal God, but a pre-categorial condition for actuality — must exist if nontrivial reflexive reality exists.

On AI and agency

The program proves formal necessary conditions for genuine agency — what a system must have to count as a genuine agent in the structural sense, as opposed to a sophisticated input-output mapper. Current AI architectures fall below these conditions.

It also establishes a formal theory of intelligence with a five-level hierarchy, and proves that no institution — including AI governance bodies — can be the universal final judge of anything nontrivial about itself or its domain.

On novelty and explanation

A complementary program — Novelty Theory — proves that even under perfectly fixed deterministic laws, genuine explanatory novelty is structurally unavoidable. The phase tower of a sufficiently expressive generator always outruns any fixed explanatory framework.

This is not Gödelian incompleteness applied to physics — it is a distinct structural result about the relationship between generators and the explanatory frameworks required to account for what they produce.

Why Formal Results Matter Here

It is worth being explicit about why formal, machine-checked results matter for these questions specifically — rather than careful philosophical argument, which has also been applied to all of them.

Philosophical argument is invaluable for framing questions, identifying the relevant considerations, and ruling out confused positions. But it has two weaknesses in this domain.

First, the concepts involved — self-reference, self-description, closure, exhaustion — are precise enough to support formal treatment, and informal argument tends to slide between subtly different readings of them without noticing.

Second, because these questions are contested and touch on things people care about deeply — AI, consciousness, free will, the foundations of reality — informal arguments are easily reconstructed as support for almost any position. A machine-checked proof does not have this weakness. It either goes through or it does not.

The proofs in this program are verified in Lean 4 — a modern interactive theorem prover used for cutting-edge mathematics and computer science. Zero custom axioms on the primary theorem chains. Every logical gap is either closed or explicitly acknowledged. The formal anchors are public and auditable. Anyone with a computer can check them.

This is what moves the discourse from handwaving to results. For questions of this importance, with this much at stake, formal results are the appropriate standard. That standard can now be met.

What This Is Not

A few clarifications about scope and claims, because precision matters here.

This is not a Theory of Everything. The program derives structural constraints that any self-contained universe must satisfy. It does not derive all of physics from pure logic — it derives what the closure constraint forces, which turns out to be more than expected but less than everything.

The AI results are not claims about current systems specifically. The theorem applies to any system expressive enough to model itself in the relevant sense. Whether a given current AI system reaches that threshold is a separate empirical question. The theorem establishes what is impossible for systems that do reach it — not a claim about what today’s LLMs can or cannot do.

The consciousness results are necessary conditions, not sufficient ones. The program establishes what any genuinely aware system must have. Whether any given system has these properties is not settled by the theorems alone.

The results are conditional on premises that are themselves strong. Perfect Self-Containment is a powerful premise. The program develops the conditional exhaustively — if PSC holds, then these results follow — and analyzes what it would take to deny PSC. But the premise itself is not derived from nothing.

The Bigger Picture

This is an unusual moment. For the first time in history, civilization is building systems that are arguably self-referential in the relevant sense — systems that reason about their own reasoning, update based on representations of themselves, and will increasingly be called upon to audit and govern themselves and each other.

This is happening without a science of what such systems can and cannot do.

The consequences are already visible. Debates about AI interpretability proceed without a formal account of what complete interpretability would require or whether it is achievable. Governance frameworks are designed around assumptions about AI self-auditing that have not been formally examined. Claims about machine consciousness are evaluated without a formal theory of what consciousness structurally requires. Arguments about the foundations of physics invoke self-referential properties of universes that have never been formally analyzed.

A science of self-referential systems will not resolve all of these debates. But it changes the character of the discourse — from speculation anchored in analogy to argument anchored in proof.

That is not a small shift. Civilizations that build and depend on complex systems need to understand those systems. Understanding requires formal tools. The formal tools are now available.

This program is a beginning, not a completion. But it establishes that the science is possible, demonstrates what it looks like when done rigorously, and proves results already strong enough to change how we should think about AI safety, the foundations of physics, and the structure of mind.

Where to Go Next

The program is large. Here are the best entry points depending on what you care about most:

If you care about AI safety and governance:
→ No AI Can Fully Verify Itself — The Formal Proof
→ Scaling Doesn’t Fix the Self-Model Problem

If you want the flagship mathematical result:
→ Closure Without Exhaustion: Why Every System That Models Itself Has an Irreducible Remainder

If you want to understand how Gödel, Turing, and Tarski are connected:
→ One Theorem Behind Gödel, Turing, Kleene, Tarski, and Löb

If you want the broadest introduction to the program:
→ What Would a Universe With No Outside Look Like? The NEMS Answer

If you want to understand the vocabulary first:
→ The Concepts Behind NEMS: A Reader’s Lexicon

Full research program — all papers and Lean libraries:
→ novaspivack.com/research

Nova Spivack

Explorer