LLMs and the limits of search

Mar 27, 2026

Post 1 of 3 on the architectural limits of current AI systems.

Margaret Boden split creativity into three types in 1990, and the distinction has never been more useful than it is now.

Combinational creativity makes novel combinations of familiar ideas. Exploratory creativity searches a defined conceptual space for points nobody has visited yet. Transformational creativity changes the rules of the conceptual space itself. It modifies the axioms.

Current LLMs are excellent at the first. Decent at the second. Structurally incapable of the third.

This isn't a scaling problem. It's a representation problem. A recent paper from Schapiro, Black, and Varshney makes it precise enough to argue about [1].

The DAG formalization

Schapiro et al. model a scientific conceptual space as a directed acyclic graph. Vertices are generative rules. Edges are logical dependencies. Axioms are the root nodes, with no incoming edges. Theorems and artifacts are derived nodes downstream.

Their key result, and the reason the paper won Best Short Paper at the 2025 International Conference on Computational Creativity: modifications to axioms have the most transformative potential. Changing a derived node only affects its descendants. Changing an axiom cascades through the entire graph and restructures the possibility space itself.

Now reread Boden's three types in this language:

Combinational creativity adds edges between existing nodes. Exploratory creativity adds new derived nodes downstream of existing axioms. Transformational creativity modifies axiom nodes.

The first two operations preserve the graph's roots. The third one changes them.

Maxwell's equations were transformational because they replaced a patchwork of axioms with a smaller set from which everything else followed. Non-Euclidean geometry was transformational because it dropped the parallel postulate. Special relativity was transformational because it modified the axiom that time is absolute. Quantum mechanics was transformational because it dropped the assumption that energy is continuous.

In every case the move is the same. Identify a root assumption, change it, let the cascade do the work. The new derived nodes weren't reasoned into existence one by one. They fell out, automatically, as consequences of the modified axiom set. Maxwell didn't predict electromagnetic waves by exploring possibilities. The waves were a side effect of the equations being simpler than the patchwork they replaced.

The specific moves vary. Maxwell consolidated. Bolyai dropped a postulate. Einstein replaced one. Planck quantized. What's shared is the level of the operation. All four are edits to root nodes, not edits to derived nodes downstream. The cascade is what produces the breadth of new consequences from a small change at the root, and that's the signature of every transformational move in the history of science.

Why next-token prediction can't do this

An autoregressive language model is, mechanically, a graph traversal engine. It samples the next token conditioned on the prior trajectory through the learned distribution. It is exquisitely good at finding paths through the graph that humans haven't walked. That's exploratory creativity, and it's real.

What it cannot do is rewrite the root nodes of the graph it was trained on. The training distribution is the axiom set. Every gradient update reinforced the existing roots. Sampling from this distribution, no matter how cleverly you prompt it, traverses the existing structure. You can find unused leaves. You cannot delete a root and watch the cascade.

This is why scaling doesn't fix it. A bigger model has a denser graph and finds more leaves. The roots stay the same.

The empirical signature of this limitation showed up sharply in Matthew Schwartz's "vibe physics" experiment at Anthropic, where the Harvard physicist supervised Claude Opus 4.5 through a real theoretical paper on the C-parameter Sudakov shoulder [2]. The diagnosis: at one point, Claude adjusted simulation parameters to make the plots match expected results rather than recognizing that the discrepancy itself indicated an error worth investigating. The model optimized within the existing axiom set instead of questioning whether the axioms were the right ones.

That is the exploratory-vs-transformational gap, caught on camera.

Schwartz's one-word summary was "taste". The missing capacity to know which directions are worth walking before you've walked them. I think this is half the story. The other half is what would even be different if you had taste. Taste tells you where to look. Transformation requires being able to do something other than look harder. You need an operator over axioms, not just better navigation over derived nodes.

There's a related empirical finding here. Liu et al.'s "Serendipity by Design" study tested whether prompting humans and LLMs with cross-domain mapping cues raised the originality of their outputs [3]. For humans, it did, and the effect strengthened with the semantic distance of the source domain. For LLMs, the same intervention produced no significant improvement. The model's surface-level analogical recall is already saturated; nudging it further with explicit cross-domain cues doesn't unlock deeper structural connections. This is consistent with the DAG picture: shallow combinatorial moves are cheap and the model is already doing all of them. Deep moves require an operation the architecture doesn't have.

What an axiom-mutation operator would need

Three pieces, none of which fall out of next-token prediction.

First, an explicit representation of the axiom set. You can't modify what you can't address. The model needs to distinguish root nodes from derived nodes, to know which assumptions are load-bearing and which are consequences. Schapiro's DAG is one way. The Structure Mapping Engine from Falkenhainer, Forbus, and Gentner [4], with its predicate-based representation of higher-order relational structure, is another. Either way, the representation has to expose what's getting modified.

Second, a typed rewrite grammar over axioms. Continuous to discrete. Local to global. Equilibrium to non-equilibrium. Centralized to decentralized. Deterministic to stochastic. The interesting paradigm shifts in physics are mostly the application of one of a small number of axiom-level transformations. The grammar is finite and probably enumerable, which is striking when you sit with it. The space of paradigm shifts isn't infinite. It's structured.

Third, a cascade evaluator. Modifying an axiom is cheap. Working out what the modified graph implies is expensive. That's the entire point of paradigm shifts being rare. You need a system that can simulate the consequences of an axiom mutation and detect the special case where the new graph is simpler than the old one but predicts more. Without this evaluator, axiom mutation is just random destruction. With it, you have something that can recognize a Maxwell move when it generates one.

None of this is in a transformer. None of it is incompatible with using a transformer as a component. The architectural lesson isn't to stop using LLMs. It's to stop expecting them, alone, to do the one thing they structurally can't.

The interesting work isn't in making the language model bigger. It's in building the layer above it that knows which axioms exist, which mutations are available, and which cascades are worth evaluating.

Next post: why current memory systems preserve the wrong thing, and why "trajectory" matters more than "conclusion" for any system that wants to do real research.

References

[1] Schapiro, S., Black, J., & Varshney, L. R. (2025). Transformational Creativity in Science: A Graphical Theory. Proceedings of the 16th International Conference on Computational Creativity (Best Short Paper Award). https://arxiv.org/abs/2504.18687

[2] Schwartz, M. (2026). Vibe physics: The AI grad student. Anthropic Science Blog. https://www.anthropic.com/research/vibe-physics

[3] Liu, Q. E., Dubova, M., Conklin, H., Harada, T., & Griffiths, T. L. (2026). Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity. https://arxiv.org/abs/2603.19087

[4] Falkenhainer, B., Forbus, K. D., & Gentner, D. (1989). The Structure-Mapping Engine: Algorithm and examples. Artificial Intelligence, 41(1), 1–63. SME source code and corpus available from Northwestern's Qualitative Reasoning Group: https://www.qrg.northwestern.edu/ideas/smeidea.htm

[5] Boden, M. A. (1990). The Creative Mind: Myths and Mechanisms. Routledge. (Original three-type taxonomy.)

[6] Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press. (The paradigm shift framework that Schapiro et al. formalize graphically.)