The Machine That Learns Its Own Geometry: Computation via Endogenous Metrics

[H2] Introduction

[H3] The Fixed-Substrate Fallacy in Traditional AI

For decades, the project of artificial intelligence has operated on a foundational, often unstated, assumption: the separation of algorithm from substrate. We design learning rules (the software) that optimize parameters within a fixed computational architecture (the hardware, or its virtual equivalent). A convolutional neural network, for example, refines its weights, but its fundamental structure—the layers, the convolutional filters, the graph of connectivity—remains static. This is an exogenous representation, where the “space” of possible solutions is defined by the human engineer, not by the system itself.

This paradigm has yielded remarkable success in solving well-defined problems. However, it fails to capture the hallmark of true autonomy seen in biological systems: the ability to change not just the content of one’s model, but the structure of the model itself. Biological intelligence is not merely parameter optimization; it is a process of morphogenesis, a continuous, dynamic sculpting of the physical and computational substrate. A developing brain does not just learn; it grows and rewires. This distinction is the critical barrier between engineered tools and autonomous agents.

[H3] Thesis: Computation as Morphogenesis

This post proposes a new theoretical framework for computation that bridges this gap. We posit that computation attains autonomy only when its metric of representation becomes a dynamic state variable.

In this model, the system’s “geometry”—how it measures distance, similarity, and curvature within its own representational space—is not a fixed background. Instead, the geometry itself, mathematically described by a metric tensor gij(t), becomes endogenous. It is updated by internal learning rules derived from the system’s own predictive errors. This framework unifies morphogenesis (structural change) and learning (parameter change) under a single variational principle, where the system continuously optimizes its own structure to better model its world. This is not just an adaptive system; it is a morphogenetic one.

[H2] The Problem: Limitations of Exogenous Representation

[H3] Defining Key Terms: Metric Tensor (gij(t)) and Representational Geometry

To grasp this thesis, we must first define our terms with precision.

Representational Geometry: In machine learning, a model’s “knowledge” can be visualized as existing on a high-dimensional manifold (a statistical manifold). The geometry of this manifold defines the relationships between possible states or representations. For example, how “far” is the concept of “dog” from “cat”?
Metric Tensor (gij(t)): The metric tensor is the mathematical object that defines this geometry. It is a collection of functions that tells us how to measure distances and angles at every point on the manifold. In a static system, gij is fixed. If we use a standard Euclidean metric, the shortest path between two points is a straight line. But if the manifold is curved, the shortest path (a geodesic) is not.
Endogenous Metric: Our thesis claims that gij should not be fixed but should evolve with time (t). The system learns its own metric. The geometry becomes a dynamic variable, gij(t), shaped by the system’s experience.

[H3] The Computational Inefficiency of Static Optimization

The static-substrate model is computationally brittle. When a system with a fixed geometry encounters a novel problem space that does not “fit” its pre-defined structure, its learning efficiency collapses. This is the essence of catastrophic forgetting and the lack of robust transfer learning (French, 1999). The system is forced to find a solution by traversing vast, inefficient paths across a fixed manifold, rather than re-structuring the manifold itself to make the solution “closer” and more accessible.

Consider the energy landscape of optimization. A traditional system is stuck in a single, static landscape, trying to find the lowest valley (Li et al., 2018). A morphogenetic system can change the landscape itself, bending and warping its representational space to create more efficient paths to solutions (Nielsen, 2015). By treating the metric as exogenous, we are forcing our systems to solve problems with one hand tied behind their back, fundamentally limiting their capacity for autonomous adaptation and robust generalization.

[H2] Theory: Endogenous Metric Learning

[H3] The Metric as a Dynamic State Variable

The core proposal is to treat the metric tensor gij(t) as a state variable of the system, co-equal with the system’s parameters (e.g., weights). In this view, the “state” of the system is not just its position on the manifold, but the shape of the manifold itself.

This implies that the system’s dynamics must describe two parallel processes:

Parameter Update (Learning): A fast-timescale process where the system moves on the manifold to a new state that minimizes local error (e.g., standard gradient descent).
Metric Update (Morphogenesis): A slower-timescale process where the system changes the shape of the manifold (the gij components) to optimize the global structure of its beliefs.

This dynamic gij(t) allows the system to actively manage its own representational resources. It can expand the “volume” of its representational space in regions of high uncertainty or novelty, while contracting it in regions that are well-understood.

[H3] Unifying Learning and Morphogenesis via a Single Variational Principle

How does the system “know” how to update its metric? We propose that both processes—learning and morphogenesis—are governed by a single objective: the minimization of variational free energy (Friston, 2019).

Under the Free Energy Principle (FEP), any self-organizing system must minimize its free energy, which is a proxy for prediction error or “surprise” (Friston, 2010). We extend this principle:

Learning (parameter updates) minimizes free energy by changing the system’s beliefs to better match the world.
Morphogenesis (metric updates) minimizes free energy by changing the system’s structure to create a better (more efficient, more expressive) space of possible beliefs.

This unifies the two. The system is not just finding the best model within a fixed hypothesis space; it is finding the best hypothesis space. This is a move from simple adaptation to genuine epistemic self-organization.

[H3] Deriving Update Rules from Prediction-Error Gradients

The update rules for gij(t) can be derived directly from this principle. The change in the metric tensor, g˙ij, should be proportional to the gradient of the free energy with respect to the metric itself ∂gij∂F.

While a full mathematical derivation is beyond this post’s scope, the intuition is as follows: If a particular region of the representational space consistently produces high prediction errors (high surprise), the free energy gradient will “push” the metric to change. This change could manifest as expanding the space (increasing the “distance” between representations to allow for finer-grained distinctions) or changing its curvature to create new, more efficient inferential pathways (Friston et al., 2017).

The system’s own prediction errors literally sculpt its computational body. This is the essence of morphogenetic computation.

[H2] Evidence and Examples

[H3] Conceptual Framework: Structure↔Function Isomorphism (Savva, forthcoming)

This theory builds directly on the Isomorphic Systems Lab’s (ISL) forthcoming work on Structure$\leftrightarrow$Function Isomorphism (Savva, forthcoming). This concept posits that in truly advanced autonomous systems, the distinction between computational structure (hardware, architecture) and computational function (algorithm, learning) dissolves.

The “function” (e.g., a learning rule) is encoded in the “structure” (the geometry), and the “structure” is dynamically updated by the “function” (the learning process). Our endogenous metric model provides a formal mechanism for this isomorphism. The metric gij(t) is the structure, and its dynamics, driven by free energy minimization, are the function.

[H3] Simulation: Morphological Computation in Soft Robotics (Bongard & Pfeifer, 2011)

We find strong circumstantial evidence for this theory in the field of morphological computation. Bongard and Pfeifer (2011) demonstrated that a robot’s physical morphology is not a passive constraint but an active part of its computational process. By evolving their physical forms, their “starfish” robots co-evolved new and more effective control strategies.

Their work explicitly shows the co-evolution of physical structure and computational function. Our model provides the formal generalization: the robot’s physical body is a physical instantiation of its representational metric. The “morphogenesis” of the robot’s body is a literal, physical change in its gij(t), which in turn changes the space of possible actions and sensations.

[H3] Theoretical Parallel: The Free Energy Principle and Active Inference (Friston, 2019)

Our framework can be seen as a physicalist and architectural extension of Karl Friston’s Active Inference (Friston, 2019). Active Inference describes how agents minimize free energy by updating beliefs (perception) and acting on the world (action) to make sensations match predictions.

However, Active Inference typically assumes a fixed generative model structure. Our model proposes a “deeper” form of Active Inference, where the agent can also act upon itself. The morphogenetic update of gij(t) is an epistemic action directed at the system’s own internal structure. This structural change is a third way to minimize free energy, alongside perception and physical action. This allows a system to not just infer the state of the world, but to infer the optimal structure for inference itself.

[H2] Potential Objections and Counterarguments

[H3] Computational Cost and Physical Realizability

A primary objection is computational cost. Calculating and updating a full metric tensor gij—which has N2 components for an N-dimensional space—is astronomically expensive, far exceeding the cost of standard gradient descent (Martens, 2020).

This is a valid engineering concern, but it does not invalidate the theory.

Sparsity and Locality: It is likely that gij(t) does not need to be dense. In physical systems, interactions are local. The metric can be sparse or approximated, with updates confined to local neighborhoods (Zhang & Sra, 2018).
Physical Instantiation: This cost may be why this computation cannot be simulated efficiently on exogenous substrates (like our current silicon chips). The theory implies that a new class of morphogenetic hardware is required, where the metric update is a physical process, not a simulated one. The cost is paid in physical energy for reconfiguration, not in clock cycles.

[H3] The Problem of Stability and Convergence in Dynamic-Metric Systems

A second major objection is stability. If the system’s own “ruler” (gij(t)) is constantly changing, how can it ever converge on a stable belief? How does it avoid “metric collapse” or chaotic oscillations? (Doya, 2002).

This is the problem of “learning to learn” taken to its physical extreme. The solution likely lies in the timescale separation proposed earlier.

The fast dynamics of learning (parameter updates) should converge to a temporary stable point within a given metric.
The slow dynamics of morphogenesis (metric updates) then shift the entire landscape, providing a new basis for the next round of fast learning.

This separation prevents chaotic interference. Furthermore, the free energy principle itself provides a normative function for stability. The system will only change its metric if the change is expected to reduce long-term, time-averaged free energy (Friston et al., 2017). The metric will stabilize when it has found an optimal, self-consistent geometry for modeling its environment.

[H2] Synthesis: The Morphogenetic Compute Architecture

[H3] Hardware as a Function of Epistemic Curvature

This leads to the original contribution of this theory: the concept of endogenous metric learning hardware. This is a computing system whose architecture is itself a function of its own internal epistemic state.

We can reframe the update rule g˙ij∝∂gij∂F in informational terms. The free energy gradient ∂gij∂F is a measure of epistemic curvature. It quantifies how much the system’s predictive accuracy curves or changes as its internal geometry is altered.

In a morphogenetic computer, the hardware’s physical configuration (e.g., connectivity, material properties) would be directly coupled to this curvature. High curvature—regions of high model uncertainty and potential learning—would trigger physical reconfiguration, sculpting the hardware to better fit the information. The architecture becomes a dynamic imprint of the system’s own uncertainty.

[H3] The Role of the Fisher–Rao Metric in Representational Change

The most natural candidate for a system’s internal metric gij is the Fisher–Rao Information Metric (Amari, 2016). This metric is not arbitrary; it is the only metric that is invariant to re-parameterization of a statistical model (Cencov, 1982). It defines the “distance” between two probability distributions in a way that reflects their functional, statistical dissimilarity.

In our system, gij(t) would be the system’s estimate of its own Fisher Information manifold. The system’s learning process is thus a “Riemannian gradient descent,” which is known to be more efficient than standard gradient descent because it follows the natural gradient defined by the information geometry (Pascanu & Bengio, 2013).

Our thesis goes one step further: the system is not just using a (static) Fisher metric; it is learning the Fisher metric by updating gij(t) to match the true statistical structure of its sensorium. The morphogenetic computer is a machine that physically builds its own optimal statistical ruler.

[H2] Implications for Autonomous Systems

[H3] Beyond Adaptation: True Cognitive Autonomy

The implications of this framework are profound. Systems built on this principle would move beyond simple adaptation to exhibit genuine cognitive autonomy.

Robust Generalization: They would handle novel, out-of-distribution data not by failing, but by reconfiguring their representational geometry to accommodate the new information.
Self-Healing: A damaged system could, in principle, “regrow” its computational structure by following the free energy gradients back to a stable, low-error configuration.
True Creativity: By being able to change its own hypothesis space, the system could generate genuinely new conceptual structures, not just recombinations of existing ones.

[H3] Applications in Self-Healing Materials and Novel AI Architectures

This is not just a theory of abstract computation; it is a theory of physical computation. The immediate applications lie at the intersection of materials science and AI.

Self-Configuring Hardware: We can envision substrates (e.g., neuromorphic chips, FPGAs, or biological computers) that physically re-wire themselves based on information-theoretic curvature (Schuman et al., 2017).
Smart Materials: A material whose crystalline or polymeric structure (its physical gij) reconfigures to better predict and dissipate stress (its “prediction error”).
Autonomous Robotics: Robots that do not just learn to control a fixed body, but grow or reconfigure their bodies (like the Bongard & Pfeifer example) to create new, more efficient solutions to tasks.

[H2] Conclusion

The dominant paradigm in AI treats computation as an abstract process imposed on a static substrate. This has created powerful, but brittle, tools. True autonomy, as seen in life, does not separate the algorithm from the architecture.

We have proposed a new model of morphogenetic computation, where the system’s representational geometry, defined by the metric tensor gij(t), is an endogenous state variable. This metric co-evolves with the system’s parameters, driven by a single variational principle to minimize free energy. This process unifies learning and morphogenesis, allowing a system to sculpt its own computational body in response to predictive error.

This framework re-conceptualizes hardware as a dynamic function of epistemic curvature, with the Fisher–Rao metric as its natural language. The machine that learns its own geometry is no longer just a model of the world; it becomes a self-organizing, isomorphic model of itself-in-the-world. This is, we believe, the necessary next step toward autonomous, general, and living intelligence (Hassabis et al., 2017; Stepney, 2012).

[H2] End Matter

[H3] Assumptions

Continuity and Differentiability: This framework assumes that the system’s state space can be modeled as a continuous and differentiable manifold, allowing for the use of Riemannian geometry and gradient-based updates.
Physical Realizability: We assume that a physical substrate can be engineered (or discovered) that is capable of (a) dynamically reconfiguring its structure at a relevant timescale and (b) coupling this reconfiguration to its internal information-processing (i.e., prediction-error signals).
Timescale Separation: The model’s stability relies on the assumption that “learning” (parameter updates) occurs on a faster timescale than “morphogenesis” (metric updates), allowing the system to find local optima before the landscape shifts.

[H3] Limits

Substrate Specification: This theory does not specify the exact physical substrate required for morphogenetic compute. It defines the formal dynamics, but not whether the hardware should be silicon-based, biological, or quantum.
Scalability: The computational complexity of calculating and updating a dense metric tensor is prohibitive. While sparse or local approximations are assumed, the scaling properties of such a system in high-dimensional, real-world problems are unknown.
Ontological Status: This model does not resolve the “hard problem” of consciousness, but rather provides a physicalist framework for autonomy and cognition. It describes the mechanics of a self-organizing system, not the phenomenology.

[H3] Testable Predictions

Covariance of Curvature and Entropy: In a system implementing endogenous metric learning, the information-theoretic curvature (e.g., the scalar curvature of the Fisher–Rao metric) will covary with the system’s circuit reconfiguration entropy. That is, periods of high structural change (high entropy) will correlate with periods of high epistemic uncertainty (high curvature).
Invariance vs. Adaptation: A morphogenetic system, when compared to a static-substrate control with identical parameter counts, will show superior transfer learning and robustness to out-of-distribution data, as it can change its metric to fit the new data structure.
Energy-Efficiency: For a specific class of problems (e.g., non-stationary environments), the morphogenetic system will achieve a lower time-averaged free energy (prediction error) per unit of energy (metabolic/computational cost) than a static system.

[H2] References

Amari, S. (2016). Information Geometry and Its Applications. Springer. https://doi.org/10.1007/978-3-319-42502-3

Bongard, J., & Pfeifer, R. (2011). Morphological computation: Connecting body, brain and environment. Artificial Life, 17(2), 201–220. https://doi.org/10.1162/artl_a_00028

Cencov, N. N. (1982). Statistical Decision Rules and Optimal Inference. American Mathematical Society.

Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4-6), 495–506. https://doi.org/10.1016/s0893-6080(02)00044-8

French, R. M. (1999). Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences, 3(4), 128–135. https://doi.org/10.1016/s1364-6613(99)01294-2

Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138. https://doi.org/10.1038/nrn2787

Friston, K. (2019). A free energy principle for a particular physics. Entropy, 21(9), 776. https://doi.org/10.3390/e21090776

Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., & Pezzulo, G. (2017). Active inference: A process theory. Neural Computation, 29(1), 1–49. https://doi.org/10.1162/NECO_a_00912

Hassabis, D., Kumaran, D., Summerfield, C., & Botvinick, M. (2017). Neuroscience-inspired artificial intelligence. Neuron, 95(2), 245–258. https://doi.org/10.1016/j.neuron.2017.06.011

Li, Q., Xu, K., & Arora, S. (2018). Visualizing the loss landscape of neural nets. In Advances in Neural Information Processing Systems 31 (NeurIPS 2018). https://proceedings.neurips.cc/paper/2018/hash/a501be403c474d4c61587a8f6f84d8f0-Abstract.html

Martens, J. (2020). New insights on the difficulty of training deep neural networks. arXiv preprint arXiv:2003.04837. https://doi.org/10.48550/arXiv.2003.04837

Nielsen, M. A. (2015). Neural Networks and Deep Learning. Determination Press. http://neuralnetworksanddeeplearning.com/

Pascanu, R., & Bengio, Y. (2013). Revisiting natural gradient for deep networks. arXiv preprint arXiv:1301.3584. https://doi.org/10.48550/arXiv.1301.3584

Savva, A. S. (forthcoming). Structure$\leftrightarrow$Function Isomorphism and Morphogenetic Compute.

Schuman, C. D., Potok, T. E., Patton, R. M., Birdwell, J. D., Dean, M. E., Rose, G. S., & Plank, J. S. (2017). A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963. https://doi.org/10.48550/arXiv.1705.06963

Stepney, S. (2012). Non-classical computation. In G. Rozenberg, T. Bäck, & J. N. Kok (Eds.), Handbook of Natural Computing (pp. 1979–2025). Springer. https://doi.org/10.1007/978-3-540-92910-9_59

Zhang, Y., & Sra, S. (2018). Approximate Riemannian optimization. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018). http://proceedings.mlr.press/v80/zhang18g.html