The Internalization of Computation - What Percepta's transformer-computer actually signals — and why the implications are far stranger than the demo

Something happened two weeks ago that most people filed under “interesting research” and moved on from. They shouldn’t have. Christos Tzamos and the Percepta team published a demonstration in which a transformer executes arbitrary C programs — not by calling out to a Python interpreter, not by emitting code for an external sandbox, but inside its own forward pass. A WebAssembly interpreter compiled directly into transformer weights. 33,000 tokens per second on a CPU. Perfect arithmetic, every time. The world’s hardest Sudoku solved in under three minutes. And — the part that should stop you cold — the whole process remains differentiable. You can propagate gradients through the computation itself.

Andrej Karpathy noticed. The Hacker News thread lit up. Then, like clockwork, the discourse collapsed into two camps: the believers who called it the end of prompt engineering, and the skeptics who asked “but why not just call Python?” Both camps missed the point.

The demo isn’t the story. The direction is the story.

We have been building AI systems that talk about computation. This points toward AI systems that are computation. That distinction will reorganize the entire field.

What They Actually Did — and What It Means

Let’s be precise. Percepta didn’t train a model to be good at arithmetic. They compiled a deterministic program interpreter directly into a small transformer’s weights: 7 layers, model dimension 36, 18 heads with a novel 2D attention geometry they call HullKVCache. The key architectural move is that the 2D attention heads reduce the decoding cost from the standard O(n²) attention to O(k + log n) — exponentially faster for long execution traces. The model doesn’t approximate what a computer would do. It is the computer, for the duration of the execution.

The critical caveats belong here too: the weights aren’t produced by gradient descent. They’re analytically constructed — compiled, not learned. No training methodology has been demonstrated for integrating this into a large pretrained model. The community skepticism is warranted. This is not a shipping product. It’s a proof of direction.

But here’s what the skeptics keep missing when they say “just call Python”: the claim isn’t that internal execution is more convenient than tool-calling right now. The claim is that when computation is internal, it’s differentiable. You can teach through it. That’s a fundamentally different design space. The gradient flows through the execution trace. Tool-calling walls off that territory entirely.

Old architecture:
  LLM → emit tool call → external execution → return result
 
What this opens:
  LLM = learned semantics ⊕ compiled substrate ⊕ internal execution trace

The second form is not incrementally better than the first. It’s categorically different. The learned part and the algorithmic part are coupled inside one artifact. They can co-adapt through training. That is what’s at stake.

The Deeper Principle

There’s a structural pattern that recurs across very different domains — physics, cognitive architecture, organizational design — and I see it here again. Call it the internalization principle: a system that carries a function internally is qualitatively more powerful than one that delegates it externally, even if the immediate outputs look identical.

A cell that synthesizes its own ATP is not merely more convenient than one that imports it. It’s more robust, more responsive, more tightly integrated with its own regulatory loops. A reasoner that executes procedures inside its own dynamics is not merely faster than one that calls a calculator. It has a fundamentally different relationship to what it’s computing.

Most AI today is performatively competent. It describes what it would do, generates text that represents computation, and offloads the actual execution elsewhere. The interesting question Percepta is forcing is: what changes when the model stops describing procedures and starts realizing them?

The answer is: quite a lot.

Near Term (12–24 Months): The Scaffolding Starts to Collapse Inward

The immediate consequence of differentiable embedded execution isn’t a new product category — it’s a new research program. The relevant question becomes: which computational structures are worth embedding, and what’s the right interface between the learned semantic layer and the compiled procedural substrate?

Arithmetic and Sudoku are proofs of concept. The interesting domains are formal verification kernels, where exact symbolic manipulation matters and approximation is disqualifying; constraint solvers and dynamic programming primitives, the backbone of planning and scheduling; state machine execution, for agents that need deterministic behavioral contracts rather than probabilistic guesses about what to do next; and symbolic parsing, where structural correctness is required, not just semantic plausibility.

In the near term, expect hybrid architectures to emerge: large pretrained models with specialized embedded execution heads for specific task families. Not a single “computer inside a transformer” but purpose-built computational substrates that get routed into by the semantic layers when needed. The scaffolding doesn’t disappear overnight — it starts to compress.

One near-term consequence worth naming: the reliable arithmetic problem, which has been an embarrassing limitation of LLMs for years, doesn’t get solved by scale. It gets solved architecturally. That’s a pattern that will repeat across many “LLMs are bad at X” complaints.

Medium Term (3–5 Years): Agents That Contain Their Own Runtime

Today’s agentic systems are orchestration machines. An LLM sits at the center, emitting instructions, and a scaffolding layer handles memory, tool dispatch, state tracking, loop control, error recovery, and execution sequencing. This scaffolding has gotten very sophisticated — and very brittle. The gap between what the model “intends” and what the external runtime actually does is where most agent failures live.

The medium-term consequence of embedded execution is that more of this stack begins migrating inside the model itself. Not all of it — there will always be reasons to call external APIs and maintain external state — but the critical-path execution logic, the parts that need to be reliable and tightly coupled to reasoning, those can move in.

What this produces is something qualitatively different from today’s agents. Less coordinator, more organism. The model isn’t routing instructions through a scaffolding layer; it’s carrying the execution substrate in its own weights, activating it when the task demands it, and flowing gradients back through it when it needs to improve.

There’s a harder consequence here too. If more of the real work happens inside the model’s forward dynamics rather than in observable external calls, interpretability gets significantly harder. Today’s agent traces are at least visible — you can watch the tool calls. Internal execution traces are another layer of opacity inside an already opaque system. The verification problem gets more serious, not less, as this architecture matures.

Agent latency drops — no round-trip overhead for critical-path computation
Behavioral stability improves — deterministic subroutines don’t drift with context length
Interpretability requirements get harder to satisfy — audit surfaces move inward
The semantic/execution split becomes a deliberate design choice, not an artifact of how we happened to build things

Long Term (5–10+ Years): Software Becomes Structure, Not a Service

The deepest implication of the Percepta work isn’t about transformers specifically. It’s about the relationship between learned representations and formal machinery. The transformer is just the current substrate. The principle is that these two things can coexist inside one artifact — and that the coupling between them can itself be learned.

Follow that principle to its conclusion and you get to something strange: foundation models that ship with embedded domain-specific virtual machines. Not as plugins or tools or MCP servers — as structure. A legal reasoning model with a formal logic kernel compiled into its weights. A financial model with an exact arithmetic and simulation substrate. A scientific model with embedded numerical solvers. The “software” is part of the model, not a service the model calls.

At this point, the distinction between a software system and an AI model becomes genuinely unclear. You’re not asking whether the system is a language model or a computer. You’re asking what computational basis has been embedded, what class of tasks that basis supports efficiently, and how the learned semantic layer routes into and out of it. Those are architecture questions, not paradigm questions.

There’s a civilizational-scale version of this shift worth naming: the current model of deploying AI is overwhelmingly one of capability rental. You call an API, you get a result, the machinery lives elsewhere. The long-term direction this points toward is capability ownership — models where the algorithmic machinery is part of the artifact you hold, not a service you access. That changes the economics, the security posture, the regulatory picture, and the competitive dynamics of the field.

We move from AI as intelligent interface to AI as intelligent substrate. That’s not a product update. That’s a structural transformation in what “AI infrastructure” even means.

What This Means for Builders Right Now

The practical question isn’t whether to rebuild your stack today based on a blog post from a 2026 startup. It isn’t. The practical question is whether you understand what’s coming well enough to make good architectural decisions in the next 18 months — decisions that either foreclose or preserve optionality in a world where embedded computation becomes real.

The “wrap an LLM with tools” pattern has maybe two to three years of architectural centrality left. It won’t disappear, but it won’t be the frontier. The interesting systems will increasingly be hybrid by design.
The differentiability claim is the key unlock to watch. If gradient flow through internal execution traces proves trainable at scale, it rewrites what “fine-tuning for reliability” even means. Watch for papers on that specifically.
The scaffolding investment you’re making today is a temporary tax, not a permanent architecture. Build it to work now; design it to be replaceable later. The teams that treat today’s agentic infrastructure as permanent will carry technical debt when the substrate shifts.
Domain-specific embedded execution is the near-term opportunity. You don’t need to wait for a general theory. Identify the deterministic subroutines in your domain that LLMs handle probabilistically and badly — and watch for architectural solutions that move those inside the model.

The headline everyone ran was “Percepta builds a computer inside a transformer.” That’s a fine headline. But the frame it puts you in is retrospective — as if this is the culmination of something. It isn’t. It’s the beginning of a question: what kinds of computation are worth carrying as structure, and how should learned cognition and formal machinery be coupled?

That question will be one of the defining architectural problems of the next decade. The teams and institutions that develop genuine intuition for it early will have an enormous advantage over those who encounter it late.

The future of AI isn’t bigger models, or smarter prompts, or more sophisticated scaffolding. It’s systems where the line between software and model no longer makes sense — because the computation has moved inside.

Nova Spivack

Explorer

The Internalization of Computation – What Percepta’s transformer-computer actually signals — and why the implications are far stranger than the demo