Context

Overview

Use context compression to manage long conversation histories by summarizing older messages while preserving recent context.
This helps maintain performance and stay within model token limits during multi-turn interactions.

When to use context compression

Use context compression when:

Conversation history exceeds token limits
Building agents with extended multi-turn conversations
Using models with strict context window limits

Initialize the SDK

Initialize the SDK client before using context compression. For more information, see Client.

Usage

PYTHON

from ibm_watsonx_orchestrate_sdk import Client
from langchain_core.messages.utils import count_tokens_approximately


def create_agent(config: RunnableConfig):
    # Initialize the client
    execution_context = config.get("configurable", {}).get("execution_context")
    client = Client(execution_context=execution_context)

    def agent_node(state: AgentState):
        messages = state.get("messages", [])
    
        if count_tokens_approximately(messages) > 30000:
            response = client.context.compress(messages=messages)
            messages = compressed_messages
    
        # -----------------------------------
        # Your business logic goes here
        #
        # - Validate inputs
        # - Call tools/models
        # - Process responses
        # - Update state
        # -----------------------------------
    
        return {"messages": [response]}


    # Build the graph
    builder = StateGraph(AgentState)
    builder.add_node("agent", agent_node)
    builder.set_entry_point("agent")
    builder.add_edge("agent", END)
    
    return builder

API reference

client.context.compress() Compress a conversation history into a summary.

Parameters

messages (List[Dict[str, Any]]) Required. List of message dictionaries in OpenAI format. Minimum 2 messages.
model (str) Optional. Model name used for summarization. Uses the default model if not provided.

Returns

A SummarizationResponse object:

summary (str)Generated summary of the conversation
original_message_count (int)Number of messages summarized
model_used (str)Model used for summarization

Raises

ValueError if fewer than 2 messages are provided
ClientAPIException if the API request fails

Message format

Messages must follow OpenAI format:

JSON

{
  "role": "user",
  "content": "Message text",
  "name": "optional",
  "tool_calls": [],
  "tool_call_id": "optional",
  "reasoning": "optional"
}

Supported roles

user
assistant
system
tool

Usage guidance

Pass full conversation history for best summarization
Use summaries to reduce prompt size before LLM calls
Combine summaries with recent messages to maintain context continuity

What to test

Compression works with valid message lists
Fails when fewer than 2 messages are provided
Summary output is usable in downstream prompts
Model selection behaves as expected

Mental model

context compression reduces conversation size
summaries replace older message windows
recent messages should remain uncompressed
use compression before LLM calls to control token usage

Release Notes

Get Started

Build

Deploy

Analyze

Developer experience

Legal notices

Overview

When to use context compression

Initialize the SDK

Usage

API reference

Parameters

Returns

Raises

Message format

Supported roles

Usage guidance

What to test

Mental model

References

​Overview

​When to use context compression

​Initialize the SDK

​Usage

​API reference

​Parameters

​Returns

​Raises

​Message format

​Supported roles

​Usage guidance

​What to test

​Mental model

​References

Overview

When to use context compression

Initialize the SDK

Usage

API reference

Parameters

Returns

Raises

Message format

Supported roles

Usage guidance

What to test

Mental model

References