Yesterday, we unveiled our next-generation Gemini model: Gemini 1.5. This updated version not only offers significant improvements in speed and efficiency, but it also introduces a long context window feature. This feature allows the model to process a larger number of tokens at once, which are the building blocks of data such as words, images, or videos. To provide insight into the significance of this innovation, we turned to the Google DeepMind team to explain the concept of long context windows and how it can benefit developers in various ways.
Long context windows play a crucial role in helping AI models recall information during a session. Much like how we might forget names or rush to take notes during a conversation, AI models also face challenges in remembering details during interactions. The long context window feature in Gemini 1.5 addresses this issue by enhancing the model’s ability to retain information.
In contrast to Gemini’s previous capacity of processing up to 32,000 tokens, the 1.5 Pro model, which is being released for early testing, now offers a context window of up to 1 million tokens, making it the largest context window of any large-scale foundation model. The Google DeepMind team even achieved successful tests of up to 10 million tokens in their research. The extension of the context window allows the model to process a vast amount of data more effectively, be it text, images, audio, code, or video.
The journey to achieve this level of advancement involved a series of deep learning innovations, with each breakthrough leading to new possibilities. These innovations were made possible through the collaborative efforts of the research and engineering leads who worked on the long context project. As a result, Gemini 1.5 Pro offers unprecedented capabilities, such as summarizing extensive documents, analyzing large volumes of code, providing detailed responses to complex queries, and even learning to translate rare languages.
While the 1.5 Pro model comes with a 128K-token context window as standard, a select group of developers and enterprise customers can experiment with a context window of up to 1 million tokens through private preview. Although this larger context window poses computational challenges, further optimizations and improvements are being actively pursued.
As the team looks to the future, they remain committed to optimizing the model for speed and efficiency while prioritizing safety. They are also focused on expanding the long context window, refining underlying architectures, and integrating new hardware improvements. Embracing the spirit of innovation, the team eagerly anticipates discovering the diverse applications that developers and the broader community will uncover by harnessing these newly unlocked capabilities.
GIPHY App Key not set. Please check settings