AI newsletter

Hello,

To continue reading, you don’t need to select all squares with traffic lights.😊

This week’s AI tip is about: How to imbue large language models (LLMs) like GPT-4 with memory for meaningful conversations

Stateless Nature of LLMs

The first thing to remember is that LLMs are inherently stateless. What this means is that they do not have a built-in capability to remember past interactions or the context in which they occur. Each query to the model is independent, which limits the continuity and cohesiveness in ongoing conversations.

The Need for Conversation Memory

The stateless architecture of LLMs raises a fundamental question: How do we make these models remember? Implementing a memory of the conversation is vital for enhanced user experience, personalized interactions and effective problem-solving.

Techniques for Enabling Memory

There are several techniques to facilitate conversation memory in LLMs. Each comes with its advantages and trade-offs. Here are some popular methods:

Conversation Buffer Memory

In this approach, the entire conversation is stored, including system messages, user prompts and AI responses. This creates a robust context and allows the AI to make sense of the ongoing dialogue effectively.

Conversation Window Memory

Rather than storing the entire conversation, this technique saves only the last 'x' messages. This is more efficient in terms of computational resources but may lose essential context in lengthy conversations.

Token-Based Memory

In this method, only the last 'x' number of tokens are stored. Tokens are the basic building blocks of language that models read, like words or subwords. This approach makes it possible to precisely control what the model retains – but it can also be restrictive.

Conversation Summary Memory

Instead of saving the entire conversation, a summary is created and stored. This is useful for encapsulating the crux of lengthy discussions and can be incredibly efficient, although there is a risk of it losing nuances.

Conclusion

The types of memory mechanisms outlined above are among the most popular, but there are certainly more to explore. Choosing the right type of memory is crucial for a couple of reasons. First, most LLMs have limitations on the number of input tokens they can process. Second, many large language models operate on a pricing model that accounts for both input and output tokens, making efficiency a key consideration.

This week’s batch of AI news

1. Meta is announcing a new AI system from its research teams that opens up an important avenue to help scientists understand the foundations of human intelligence. Read more: https://ai.meta.com/blog/brain-ai-image-decoding-meg-magnetoencephalography/

2. An amazing new paper introduces RingAttention, which enables scaling context lengths to 0.5M tokens and beyond! Well kind of, depending on device count. More precisely, it enables training and inference of sequences that are up to device count times longer than previously possible.
Read more: https://arxiv.org/abs/2310.01889

Chatbot soon,

Damian Mazurek

Chief Innovation Officer

About Software Mind

Software Mind engineers software that reimagines tomorrow, by providing companies with autonomous development teams who manage software life cycles from ideation to release and beyond. For over 20 years we’ve been enriching organizations with the talent they need to boost scalability, drive dynamic growth and bring disruptive ideas to life. Our top-notch engineering teams combine ownership with leading technologies, including cloud, AI and data science to accelerate digital transformations and boost software delivery.