Neural CA: Spatial Parameter Fields and Greatest-Hits Memory

This is a browser-based cellular automaton experiment that takes a different approach to the same challenge as the trend-aware experiment: how do you keep a self-tuning CA alive and interesting? Here the key innovation is that the rule parameters are not uniform scalars — they are spatial fields. Different parts of the grid run under different rules simultaneously, and the neural controller manipulates those fields rather than single values. When the system finds a rich configuration, it saves a snapshot. When it gets stuck, it restores one. Like the companion post, this build optimizes a composite complexity score but also tracks boredom: flat, low-novelty plateaus are gently penalized in the value update and encourage drift toward livelier behavior without forcing constant instability.

▶ Run the simulation in your browser ↗

(Works best on a modern desktop browser. Click Run / pause to start. The three small panels on the right show the live spatial fields for g, k1, and k2.)

What’s different from the trend-aware experiment

In the trend-aware experiment, the three rule parameters (g, k1, k2) are single global values — the same everywhere on the grid. Here each parameter is a 6×6 control grid, bilinearly interpolated to the full 144×144 simulation. The rule is spatially heterogeneous: one corner of the grid might be in a high-growth regime while another is slow. This enables richer coexistence — multiple pattern types living side by side — and gives the controller a much larger space to explore.

Two additional continuous knobs extend the rule further: growthExp (an exponent on the growth term, letting it curve sub- or super-linearly) and diagWeight (how much diagonal neighbors count relative to orthogonal ones). These add texture and anisotropy without changing the basic reaction-diffusion logic.

Gallery

How it works

Spatial parameter fields

Each of g, k1, k2 is stored as a 6×6 grid of floats, bilinearly interpolated to full resolution before each simulation step. The controller can shift all values in a field uniformly, bump a random location with a Gaussian perturbation, or smooth all fields. This means actions have spatially distributed effects — a bump in the g field creates a local high-growth region that can seed new pattern types.

Beam search lookahead

The action space here is larger (12 actions vs. 6), so the 2-step lookahead of the trend-aware experiment is replaced by beam search (width 3, depth 3). At each controller step the agents maintain a beam of the 3 most promising action sequences, expanding each by the full action set and keeping the top 3 at each depth. This finds better multi-step plans with manageable compute — still running in real time in the browser.

Greatest-hits memory

Every 15 ticks, if the current complexity exceeds 0.6 and hasn’t been seen recently, the simulation saves a full snapshot: grid state, all three parameter fields, growthExp, and diagWeight. Up to 6 snapshots are kept, sorted by complexity. When the system gets stuck in the same behavioral bin for too long, rather than randomizing everything (which often destroys good structure), it restores a random snapshot from the greatest-hits library with 70% probability. The system resumes from a previously interesting configuration and continues from there. This is a form of long-term memory that makes exploration more efficient.

Interest, boredom, and long-run value

Complexity is the headline objective, but staying in the same narrow basin with flat traces is treated as costly. A boredom signal aggregates how long live, change, and complexity have been unusually stable relative to how alive the pattern looks. That feeds a mild penalty into the tabular TD update so regions that are merely repetitive lose long-term appeal. Beam search scoring picks up the same idea at decision time: when boredom is higher, the controller tilts toward under-explored behavioral bins and away from cycling in place — still bounded so the simulation does not devolve into noise.

The dynamics MLP includes boredom in its input vector (alongside field means, knobs, and the action), so the learned transition model can distinguish a calm basin from a restless one. Together with greatest-hits restore and plateau-breaking logic, the aim is to wander toward visually rich regimes and not camp in one for too long.

What to watch

The three field panels on the right show the live g, k1, k2 spatial distributions — watch bumps and gradients appear as the controller experiments
The hits counter in the status line shows how many snapshots are in the greatest-hits library
When the status shows RESTORE hit, the system has just jumped back to a saved high-complexity state
The archive panel (bottom right) shows the learned value function across behavioral space — same as the trend-aware experiment, but shaped differently by the richer action space
The mean parameter values (g̅, k1̅, k2̅) update in real time as the fields shift

Code

Single self-contained HTML file, no dependencies. View source on GitHub ↗ · neural-ca repo ↗

Related post: Neural CA: Trend-Aware Agents Learn to Keep a Cellular Automaton Alive — global parameters, preemptive flee mode, and 2-step lookahead.

Nova Spivack

Explorer