Augmenting Language Models with Long-Term Memory

Existing large language models (LLMs) have limitations due to input length limits, preventing them from utilizing rich long-context information. To address this issue, we propose a framework called Language Models Augmented with Long-Term Memory (LONG MEM) which is capable of memorizing long history. This framework uses a novel decoupled network architecture with a frozen LLM as a memory encoder and an adaptive residual side-network as a memory retriever and reader. By employing memory-augmented adaptation training, LONG MEM can memorize long past context and utilize long-term memory for language modeling. The proposed memory retrieval module can handle unlimited-length context in its memory bank to benefit various downstream tasks, including in-context learning. Our method outperforms strong long-context models on ChapterBreak, a benchmark for long-context modeling, and achieves remarkable improvements on memory-augmented in-context learning over LLMs. Our code is open-sourced at https://aka.ms/LongMem.

Augmenting Language Models with Long-Term Memory

Previoujs Article

⚡zap⚡ - blazingly fast backends in zig

Next Article

Developer experience: what is it and why should you care?

Tags