Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developer.watson-orchestrate.ibm.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Chat models is part of the Agentic SDK LangChain integration. It provides a chat model interface that routes chat completion requests through watsonx Orchestrate using a consistent SDK interface. It aligns with the SDK runtime model for handling authentication, context, and API routing, and supports chat-based interactions, structured outputs, tool calling, and streaming responses. Chat models support the same API as langchain’s chat abstractions, and may be used as a direct replacement for running inside Orchestrate.

Initialization patterns

From Instance Credentials (Standalone/Runs-Elsewhere Mode)

For standalone scripts or applications outside watsonx Orchestrate runtime:
PYTHON
from ibm_watsonx_orchestrate_sdk.langchain import ChatWxO

llm = ChatWxO.from_instance_credentials(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-wxo-api-key",
    model="watsonx/meta-llama/llama-3-2-90b-vision-instruct",
    temperature=0.7,
    max_tokens=1000
)

response = llm.invoke("Tell me a joke about programming")
print(response.content)
For LangGraph agents with RunnableConfig:
PYTHON
from ibm_watsonx_orchestrate_sdk.langchain import ChatWxO
from langgraph.graph.state import RunnableConfig

def create_agent(config: RunnableConfig):
    llm = ChatWxO.from_runnable_config(
        config=config,
        model="watsonx/meta-llama/llama-3-2-90b-vision-instruct"
    )
    
    response = llm.invoke("Tell me a joke about programming")
    print(response.content)

From Execution Context (Runtime/Runs-On Mode)

When running inside a watsonx Orchestrate runtime with execution context:
PYTHON
from ibm_watsonx_orchestrate_sdk.langchain import ChatWxO

# Execution context provided by WxO runtime
execution_context = runnable_config.get("configurable", {}).get("execution_context")

llm = ChatWxO.from_execution_context(
    execution_context=execution_context,
    model="watsonx/ibm/granite-3-8b-instruct",
    temperature=0.2
)

response = llm.invoke("What is the capital of France?")
print(response.content)

watsonx Orchestrate Agentic Session

For advanced use cases with pre-configured AgenticSession:
PYTHON
from ibm_watsonx_orchestrate_sdk.langchain import ChatWxO
from ibm_watsonx_orchestrate_sdk.client import Client

# Create client and get session
client = Client.from_instance_credentials(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-wxo-api-key"
)

llm = ChatWxO.from_session(
    session=client.session,
    model="watsonx/ibm/granite-3-8b-instruct"
)

response = llm.invoke("Hello!")
print(response.content)

Direct initialization (Advanced)

Direct initialization with all parameters:
PYTHON
from ibm_watsonx_orchestrate_sdk.langchain import ChatWxO

llm = ChatWxO(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-wxo-api-key",
    model="watsonx/meta-llama/llama-3-2-90b-vision-instruct",
    temperature=0.7,
    max_tokens=1000
)

response = llm.invoke("Tell me a joke about programming")
print(response.content)

Usage examples

Basic chat completion

PYTHON
from ibm_watsonx_orchestrate_sdk.langchain import ChatWxO

llm = ChatWxO.from_instance_credentials(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-api-key",
    model="watsonx/ibm/granite-3-8b-instruct"
)

# Simple string input
response = llm.invoke("What is machine learning?")
print(response.content)

# Message format
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You are a helpful AI assistant."),
    HumanMessage(content="Explain quantum computing in simple terms.")
]

response = llm.invoke(messages)
print(response.content)

Streaming Responses

PYTHON
# Synchronous streaming
for chunk in llm.stream("Write a short story about a robot"):
    print(chunk.content, end="", flush=True)

# Async streaming
import asyncio

async def stream_example():
    async for chunk in llm.astream("Explain photosynthesis"):
        print(chunk.content, end="", flush=True)

asyncio.run(stream_example())

Tool calling

PYTHON
from pydantic import BaseModel, Field

class GetWeather(BaseModel):
    """Get the current weather for a location"""
    location: str = Field(description="City and state, e.g. San Francisco, CA")
    unit: str = Field(description="Temperature unit", enum=["celsius", "fahrenheit"])

class GetPopulation(BaseModel):
    """Get the population of a city"""
    location: str = Field(description="City and state, e.g. San Francisco, CA")

# Bind tools to the model
llm_with_tools = llm.bind_tools([GetWeather, GetPopulation])

response = llm_with_tools.invoke("What's the weather and population in NYC?")

# Access tool calls
for tool_call in response.tool_calls:
    print(f"Tool: {tool_call['name']}")
    print(f"Args: {tool_call['args']}")

Structured output

PYTHON
from pydantic import BaseModel, Field

class Person(BaseModel):
    """Information about a person"""
    name: str = Field(description="Person's full name")
    age: int = Field(description="Person's age in years")
    occupation: str = Field(description="Person's job or profession")
    hobbies: list[str] = Field(description="List of hobbies")

# Create structured output model
structured_llm = llm.with_structured_output(Person)

# Get structured response
person = structured_llm.invoke(
    "Tell me about a software engineer named Alice who is 28 years old "
    "and enjoys hiking, reading, and photography."
)

print(f"Name: {person.name}")
print(f"Age: {person.age}")
print(f"Occupation: {person.occupation}")
print(f"Hobbies: {', '.join(person.hobbies)}")

Batch processing

PYTHON
# Process multiple inputs in parallel
messages_batch = [
    "What is Python?",
    "What is JavaScript?",
    "What is Rust?"
]

responses = llm.batch(messages_batch)

for i, response in enumerate(responses):
    print(f"Q{i+1}: {messages_batch[i]}")
    print(f"A{i+1}: {response.content}\n")

# Async batch processing
async def batch_example():
    responses = await llm.abatch(messages_batch)
    return responses

asyncio.run(batch_example())

Advanced Configuration

PYTHON
llm = ChatWxO.from_instance_credentials(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-api-key",
    model="watsonx/meta-llama/llama-3-2-90b-vision-instruct",
    
    # Model parameters
    temperature=0.7,
    max_tokens=2000,
    top_p=0.9,
    frequency_penalty=0.0,
    presence_penalty=0.0,
    
    # Streaming configuration
    streaming=True,
    
    # Request configuration
    timeout=60.0,
    max_retries=3
)

Supported Methods

ChatOpenAI supports the following methods:
  • invoke(messages) - Synchronous chat completion
  • ainvoke(messages) - Async chat completion
  • stream(messages) - Synchronous streaming
  • astream(messages) - Async streaming
  • batch(messages_list) - Batch processing
  • abatch(messages_list) - Async batch processing
  • bind_tools(tools) - Bind tools/functions
  • with_structured_output(schema) - Structured output

Class Methods

  • from_instance_credentials(instance_url, api_key, model, **kwargs) - Create from instance credentials (standalone/runs-elsewhere)
  • from_execution_context(execution_con text, model, **kwargs) - Create from execution context (runtime/runs-on)
  • from_session(session, model, **kwargs) - Create from AgenticSession (runtime/runs-on)
  • from_runnable_config(config, model, **kwargs) - Create from RunnableConfig (LangGraph)

Chat model IDs

Use the chat model ID formats returned by the watsonx Orchestrate /models endpoint:
PYTHON
provider/model-name
Examples:
  • watsonx/meta-llama/llama-3-2-90b-vision-instruct
  • watsonx/ibm/granite-3-8b-instruct
  • Chat models provides a drop-in replacement for chat model usage in LangChain-based agents.
  • The model ID must follow the format returned by the platform.
  • Authentication and request routing are handled through the SDK interface.

References