we've been trying to solve memory in neural architectures since the 50s, starting from symbolic memory (SOAR architecture) to RNNs and LSTMs with gates to control information flow, then the legendary "Attention is all you need" paper in 2017 introducing a context window as memory (-> context rot, lost in the middle), and now finally external memory banks (RAG, graphs, scratch pads). catastrophic forgetting continues. labs rush to solve it.

  • memory is not a bag of facts, it's layered control: it's not just about storing information, it's about being selective about what to remember and what to forget. pruning and forgetting are equally important when it comes to memory
  • memory != continual learning: learning isnt just remembering and is never static ofc. LLMs are static post-pretraining and limited to in-context learning, lacking the ability for adaptation and "neuroplasticity"
  • formation of long-term memory involves two consolidation processes: online (synaptic) consolidation soon after learning where new info is stabilised and being transferred from ST to LT storage, then offline (systems) consolidation, replaying recently encoded patterns with sharp-wave ripples in the hippocampus