Overview
Use context compression to manage long conversation histories by summarizing older messages while preserving recent context.This helps maintain performance and stay within model token limits during multi-turn interactions.
When to use context compression
Use context compression when:- Conversation history exceeds token limits
- Building agents with extended multi-turn conversations
- Using models with strict context window limits
Initialize the SDK
Initialize the SDK client before using context compression. For more information, see Client.Usage
PYTHON
API reference
client.context.compress()
Compress a conversation history into a summary.
Parameters
-
messages (
List[Dict[str, Any]]) Required. List of message dictionaries in OpenAI format. Minimum 2 messages. -
model (
str) Optional. Model name used for summarization. Uses the default model if not provided.
Returns
A SummarizationResponse object:- summary (str)Generated summary of the conversation
- original_message_count (int)Number of messages summarized
- model_used (str)Model used for summarization
Raises
- ValueError if fewer than 2 messages are provided
- ClientAPIException if the API request fails
Message format
Messages must follow OpenAI format:JSON
Supported roles
userassistantsystemtool
Usage guidance
- Pass full conversation history for best summarization
- Use summaries to reduce prompt size before LLM calls
- Combine summaries with recent messages to maintain context continuity
What to test
- Compression works with valid message lists
- Fails when fewer than 2 messages are provided
- Summary output is usable in downstream prompts
- Model selection behaves as expected
Mental model
- context compression reduces conversation size
- summaries replace older message windows
- recent messages should remain uncompressed
- use compression before LLM calls to control token usage

