This technical report introduces a new architecture, Recurrent Memory Transformer (RMT), that extends the context length of BERT, one of the most effective Transformer-based models in natural language processing, to an unprecedented two million tokens. By leveraging simple token-based memory storage and segment-level recurrence, the RMT successfully stores and processes both local and global information and enables information flow between segments of the input sequence. Our experiments demonstrate the effectiveness of the approach, which holds significant potential to enhance long-term dependency handling and enable large-scale context processing for memory-intensive applications.