Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developer.watson-orchestrate.ibm.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Use context compression to manage long conversation histories by summarizing older messages while preserving recent context.
This helps maintain performance and stay within model token limits during multi-turn interactions.

When to use context compression

Use context compression when:
  • Conversation history exceeds token limits
  • Building agents with extended multi-turn conversations
  • Using models with strict context window limits

Initialize the SDK

Initialize the SDK client before using context compression. For more information, see Client.

Usage

PYTHON
from ibm_watsonx_orchestrate_sdk import Client

# Initialize the client
client = Client(
    api_key="your_api_key",
    instance_url="your_instance_url"
)

# Prepare conversation messages
messages = [
    {"role": "user", "content": "What hotels are available in Paris?"},
    {"role": "assistant", "content": "I found 50 hotels in Paris..."},
    {"role": "user", "content": "What about London?"},
    {"role": "assistant", "content": "Here are hotels in London..."}
]

# Compress the conversation
response = client.context.compress(messages=messages)

print(f"Summary: {response.summary}")
print(f"Compressed {response.original_message_count} messages")
print(f"Model used: {response.model_used}")

API reference

client.context.compress() Compress a conversation history into a summary.

Parameters

  • messages (List[Dict[str, Any]]) Required. List of message dictionaries in OpenAI format. Minimum 2 messages.
  • model (str) Optional. Model name used for summarization. Uses the default model if not provided.

Returns

A SummarizationResponse object:
  • summary (str)Generated summary of the conversation
  • original_message_count (int)Number of messages summarized
  • model_used (str)Model used for summarization

Raises

  • ValueError if fewer than 2 messages are provided
  • ClientAPIException if the API request fails

Message format

Messages must follow OpenAI format:
JSON
{
  "role": "user",
  "content": "Message text",
  "name": "optional",
  "tool_calls": [],
  "tool_call_id": "optional",
  "reasoning": "optional"
}

Supported roles

  • user
  • assistant
  • system
  • tool

Usage guidance

  • Pass full conversation history for best summarization
  • Use summaries to reduce prompt size before LLM calls
  • Combine summaries with recent messages to maintain context continuity

What to test

  • Compression works with valid message lists
  • Fails when fewer than 2 messages are provided
  • Summary output is usable in downstream prompts
  • Model selection behaves as expected

Mental model

  • context compression reduces conversation size
  • summaries replace older message windows
  • recent messages should remain uncompressed
  • use compression before LLM calls to control token usage

References