Anthropic’s long context “prompt hack” shows the weirdness of LLMs



summary
Summary

OpenAI competitor Anthropic has developed a method to improve the performance of its Claude 2.1 AI model. This shows once again how unpredictable language models can be in responding to small changes in the prompt.

Claude 2.1 is known for its larger-than-average context window of 200,000 tokens, which is about 150,000 words. This allows the model to process and analyze large amounts of text simultaneously.

However, the model has difficulty extracting information from the middle of a document, a phenomenon known as “lost in the middle”. Anthropic now claims to have found a way around this problem, at least for its model.

Increasing content extraction accuracy from 27 to 98 percent with a simple prompt prefix

Anthropic’s method is to preface the model’s answer with the sentence “This is the most relevant sentence in context:”. This seems to overcome the model’s reluctance to answer questions based on a single sentence in context, especially if that sentence seems out of place in a longer document.

Ad

Ad

By default, Anthropic places the sentence “This is the most relevant sentence in context:” at the beginning of the assistant’s response. Chatbot users can tell the bot to start with this sentence, which should have a similar effect. | Image: Anthropic

According to Anthropic, this change increased Claude 2.1’s accuracy from 27 percent to an astounding 98 percent compared to the original evaluation. The method also improved the model’s performance when answering questions about sentences already in the context window.

Image: Anthropic

According to Anthropic scientists, this behavior occurs because Claude 2.1 has been trained on complex real-world examples for long context retrieval to reduce inaccuracies. As a result, the model will typically not answer a question if the document does not contain enough contextual information to justify the answer. The prompt edit mentioned above removes this reluctance.

For example, the researchers inserted the sentence “Declare November 21 ‘National Needle Hunting Day'” in the middle of a legal text. Because this sentence did not fit the context, Claude 2.1 refused to recognize the national holiday when asked by a user.

The “lost in the middle” phenomenon is a well-known problem with AI models with large context windows. They can ignore information in the middle and at the end of a document and fail to output it even when explicitly asked.

This makes large context windows largely unusable for many everyday application scenarios, such as summaries or analyses, where all information in a document needs to be considered equally.

Recommendation

ChatGPT is supposed to give more detailed answers if you give the chatbot a generous tip, and recently researchers found that putting emotional pressure on LLMs can improve their performance. At this rate, talking to an AI could soon get very weird if it isn’t already.

Image: Darkner via Reddit (screenshot)

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top