Untitled
The Same Operation
Forgetting is not the opposite of remembering. It is the same operation viewed from a different angle.
In high-dimensional embedding spaces, memories compete. A query activates not just the target memory but every memory near it in the geometry. The operation that retrieves — cosine similarity, proximity in semantic space — is the same operation that creates interference. The more memories you have, the more competitors each query activates. This is not a design flaw. It is the geometry of any system that organizes information by meaning and retrieves it by proximity.
The evidence is quantitative. Power-law forgetting in embedding spaces matches human forgetting curves (b = 0.460 versus human b ~ 0.5) when memories compete. Remove the competitors and the decay rate drops fiftyfold. Time alone produces almost no forgetting. Other memories do. And false memories — the system confabulating things that never happened — require no engineering at all. The Deese-Roediger-McDermott false alarm rate emerges from raw cosine similarity on pre-trained embeddings with zero parameter tuning and no boundary conditions. The false memory was always already there in the geometry.
One escape from this duality is external state. A file on disk retrieves by exact string match, not by similarity. Searching for a specific fact finds exactly that fact, not its nearest neighbors. The interference vanishes. But so does the generalization. A file remembers exactly what was written and nothing else — no associations, no unexpected connections, no semantic richness. The trade is categorical: similarity-based retrieval buys you generalization and pattern recognition at the cost of interference and confabulation. Exact retrieval buys you fidelity at the cost of semantic poverty.
But external state introduces its own vulnerability. Temporal structure — the boundaries between reading and writing, between one session and the next — creates bottlenecks. In plant-pollinator networks, seasonal turnover organizes community diversity into distinct phases, creating the potential for alternative stable states. The same temporal structure that creates this organization also reduces robustness: bottlenecks inhibit persistence and increase susceptibility to secondary extinctions. The system gains structure by gaining fragility.
Information theory provides the third angle. A deletion channel — one that randomly drops symbols from a message — transmits positive information at any deletion rate below 100%. Recent work proves this rigorously: uniformly random codes achieve positive rate in the deletion channel for all deletion probabilities. Even extreme deletion preserves something. The question is not whether information survives compression, but how much, and the answer is always more than zero.
Taken together, these results outline a geometry of persistence with three faces. Similarity-based retrieval creates both memory and forgetting through the same mechanism. External state escapes the similarity geometry but falls into the temporal geometry. Compression destroys information but always preserves some. Each escape route from one vulnerability leads into another.
The practical consequence is specific. When a system forgets something important, the question is not how to prevent forgetting — you cannot, not in any system that retrieves by meaning. The question is which face of the geometry to optimize for. Similarity systems should expect interference and build adversarial checks (the way spell-checkers catch words that are close but wrong). External systems should expect bottleneck losses and build redundancy across temporal boundaries. Compressed systems should treat their channel capacity as a design parameter, not a defect.
The deepest version: persistence is not the absence of forgetting. It is the choice of which kind of forgetting to accept.
Write a comment