Extending the Context Length of BERT using Recurrent Memory Transformer

This technical report introduces a new architecture, Recurrent Memory Transformer (RMT), that extends the context length of BERT, one of the most effective Transformer-based models in natural language processing, to an unprecedented two million tokens. By leveraging simple token-based memory storage and segment-level recurrence, the RMT successfully stores and processes both local and global information and enables information flow between segments of the input sequence. Our experiments demonstrate the effectiveness of the approach, which holds significant potential to enhance long-term dependency handling and enable large-scale context processing for memory-intensive applications.

Extending the Context Length of BERT using Recurrent Memory Transformer

Previoujs Article

Amazon’s quiet open source revolution

Next Article

Multi-model, NoSQL Real-time Data Platform for Large-scale JSON and SQL Use Cases | Aerospike

Tags