Skip to main content

Overview

Embeddings is used to turn text into vectors that capture semantic meaning, so similar texts end up close together in vector space. This makes it useful for search, clustering, recommendations, classification, anomaly detection, and semantic similarity checks like finding duplicate or related content. In practice, you often use it in retrieval-augmented generation (RAG): embed your documents, embed the user’s question, compare the vectors, and return the most relevant passages to a model. It’s also commonly used for text search and “find things like this” workflows rather than simple keyword matching. Embeddings support the same API as langchain’s embeddings abstractions, and may be used as a direct replacement for running inside Orchestrate.

Initialization patterns

From Instance Credentials (Standalone/Runs-Elsewhere Mode)

PYTHON
from ibm_watsonx_orchestrate_sdk.langchain import WxOEmbeddings

embeddings = WxOEmbeddings.from_instance_credentials(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-wxo-api-key",
    model="openai/text-embedding-3-small"
)

# Embed a single query
query_embedding = embeddings.embed_query("What is machine learning?")
print(f"Embedding dimension: {len(query_embedding)}")

Direct initialization (Standalone/Runs-Elsewhere Mode) (Advanced)

See ‘Advanced Configuration’ section for all available parameters.
PYTHON
from ibm_watsonx_orchestrate_sdk.langchain import WxOEmbeddings

embeddings = WxOEmbeddings(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-wxo-api-key",
    model="openai/text-embedding-3-small",
    dimensions=1024,
    skip_empty=True
)

# Embed a single query
query_embedding = embeddings.embed_query("What is machine learning?")
print(f"Embedding dimension: {len(query_embedding)}")
PYTHON
from typing import Annotated, List, TypedDict
from langchain_core.messages import AIMessage, BaseMessage
from langgraph.graph import END, START, StateGraph
from langgraph.graph.state import RunnableConfig
from ibm_watsonx_orchestrate_sdk.langchain import WxOEmbeddings

class AgentState(TypedDict):
    """Simple state with conversation history."""
    messages: Annotated[List[BaseMessage], "conversation history"]

def create_agent(config: RunnableConfig):
    # NOTE: RunnableConfig is passed by WxO Runtime directly to agent's `create_agent()` function.

    def embed_string(state: AgentState):
        embeddings = WxOEmbeddings.from_runnable_config(
            config=config,
            model="openai/text-embedding-3-small"
        )
        
        # Embed a single query
        query_embedding = embeddings.embed_query("What is machine learning?")
        response = AIMessage(content=f"Embedding dimension: {len(query_embedding)}")
        return {"messages": state["messages"] + [response]}
    
    return embeddings

From Execution Context (Runtime/Runs-On Mode)

PYTHON
from typing import Annotated, List, TypedDict
from langchain_core.messages import AIMessage, BaseMessage
from langgraph.graph import END, START, StateGraph
from langgraph.graph.state import RunnableConfig
from ibm_watsonx_orchestrate_sdk.langchain import WxOEmbeddings

class AgentState(TypedDict):
    """Simple state with conversation history."""
    messages: Annotated[List[BaseMessage], "conversation history"]

def create_agent(config: RunnableConfig):
    # NOTE: RunnableConfig is passed by WxO Runtime directly to agent's `create_agent()` function.

    execution_context = config.get("configurable", {}).get("execution_context")

    def embed_string(state: AgentState):
        embeddings = WxOEmbeddings.from_runnable_config(
            execution_context=execution_context,
            model="openai/text-embedding-3-small"
        )
        
        # Embed a single query
        query_embedding = embeddings.embed_query("The quick brown fox jumped over the lazy dog.")
        response = AIMessage(content=f"Embedding dimension: {len(query_embedding)}")
        return {"messages": state["messages"] + [response]}
    
    builder = StateGraph(AgentState)
    builder.add_node("ask_question", ask_question)
    builder.add_edge(START, "ask_question")
    builder.add_edge("ask_question", END)
    return builder

Usage examples

Basic Embeddings

PYTHON
from ibm_watsonx_orchestrate_sdk.langchain import WxOEmbeddings

embeddings = WxOEmbeddings.from_instance_credentials(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-api-key",
    model="openai/text-embedding-3-small"
)

# Embed a single query
query = "What is the capital of France?"
query_embedding = embeddings.embed_query(query)
print(f"Query embedding: {len(query_embedding)} dimensions")

# Embed multiple documents
documents = [
    "Paris is the capital of France.",
    "London is the capital of England.",
    "Berlin is the capital of Germany."
]
doc_embeddings = embeddings.embed_documents(documents)
print(f"Embedded {len(doc_embeddings)} documents")

Async Embeddings

PYTHON
import asyncio

async def embed_async():
    embeddings = WxOEmbeddings.from_instance_credentials(
        instance_url="https://your-instance.cloud.ibm.com",
        api_key="your-api-key",
        model="openai/text-embedding-3-small"
    )
    
    # Async single query
    query_embedding = await embeddings.aembed_query("What is AI?")
    print(f"Query embedding: {len(query_embedding)} dimensions")
    
    # Async multiple documents
    documents = ["Document 1", "Document 2", "Document 3"]
    doc_embeddings = await embeddings.aembed_documents(documents)
    print(f"Embedded {len(doc_embeddings)} documents")

asyncio.run(embed_async())

Semantic Search with Vector Store

PYTHON
from ibm_watsonx_orchestrate_sdk.langchain import WxOEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document

# Initialize embeddings
embeddings = WxOEmbeddings.from_instance_credentials(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-api-key",
    model="openai/text-embedding-3-small"
)

# Create documents
documents = [
    Document(page_content="Paris is the capital of France.", metadata={"country": "France"}),
    Document(page_content="London is the capital of England.", metadata={"country": "England"}),
    Document(page_content="Berlin is the capital of Germany.", metadata={"country": "Germany"}),
    Document(page_content="Madrid is the capital of Spain.", metadata={"country": "Spain"}),
]

# Create vector store
vectorstore = FAISS.from_documents(documents, embeddings)

# Perform similarity search
query = "What is the capital of France?"
results = vectorstore.similarity_search(query, k=2)

for doc in results:
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}\n")

RAG (Retrieval-Augmented Generation)

PYTHON
from ibm_watsonx_orchestrate_sdk.langchain import ChatWxO, WxOEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# Initialize embeddings and LLM
embeddings = WxOEmbeddings.from_instance_credentials(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-api-key",
    model="openai/text-embedding-3-small"
)

llm = ChatWxO.from_instance_credentials(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-api-key",
    model="watsonx/ibm/granite-3-8b-instruct"
)

# Create knowledge base
documents = [
    Document(page_content="Python is a high-level programming language."),
    Document(page_content="JavaScript is used for web development."),
    Document(page_content="Java is an object-oriented programming language."),
]

vectorstore = FAISS.from_documents(documents, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

# Create RAG chain
template = """Answer the question based on the following context:

Context: {context}

Question: {question}

Answer:"""

prompt = ChatPromptTemplate.from_template(template)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
)

# Ask a question
response = rag_chain.invoke("What is Python?")
print(response.content)

Similarity Calculation

PYTHON
import numpy as np
from ibm_watsonx_orchestrate_sdk.langchain import WxOEmbeddings

embeddings = WxOEmbeddings.from_instance_credentials(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-api-key",
    model="openai/text-embedding-3-small"
)

# Embed texts
text1 = "Machine learning is a subset of artificial intelligence"
text2 = "AI includes machine learning and deep learning"
text3 = "The weather is nice today"

embedding1 = embeddings.embed_query(text1)
embedding2 = embeddings.embed_query(text2)
embedding3 = embeddings.embed_query(text3)

# Calculate cosine similarity
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

sim_1_2 = cosine_similarity(embedding1, embedding2)
sim_1_3 = cosine_similarity(embedding1, embedding3)

print(f"Similarity between text1 and text2: {sim_1_2:.4f}")
print(f"Similarity between text1 and text3: {sim_1_3:.4f}")

Advanced Configuration

Note: additional params can be passed via direct initialization (WxOEmbeddings.__init__()) or any of the helpers (from_instance_credentials, from_runnable_config, from_execution_context, from_session).
PYTHON
embeddings = WxOEmbeddings.from_instance_credentials(
    instance_url="https://your-instance.cloud.ibm.com",
    api_key="your-api-key",
    model="openai/text-embedding-3-small",
    
    # Embedding parameters
    dimensions=1024,  # Number of dimensions for output embeddings (text-embedding-3 models only)
    
    # Batch processing
    chunk_size=1000,  # Maximum number of texts to embed in each batch
    
    # Request configuration
    max_retries=3,  # Maximum number of retries to make when generating
    request_timeout=60.0,  # Timeout for requests to embedding API
    
    # Text handling
    skip_empty=True,  # Whether to skip empty strings when embedding
    show_progress_bar=False,  # Whether to show a progress bar when embedding
    
    # Tokenization (for non-OpenAI providers)
    check_embedding_ctx_length=False  # Set to False to send raw text instead of tokens
)

Supported methods

OpenAIEmbeddings supports the following methods:
  • embed_query(text) - Embed a single text query
  • embed_documents(texts) - Embed multiple documents
  • aembed_query(text) - Async embed a single text query
  • aembed_documents(texts) - Async embed multiple documents

Class methods

Embeddings supports the following class methods:
  • from_instance_credentials(instance_url, api_key, model, **kwargs) - Create from instance credentials (standalone/runs-elsewhere)
  • from_execution_context(execution_context, model, **kwargs) - Create from execution context (runtime/runs-on)
  • from_session(session, model, **kwargs) - Create from AgenticSession (runtime/runs-on)
  • from_runnable_config(config, model, **kwargs) - Create from RunnableConfig (runtime/runs-on)

Embedding model IDs

Use the model ID formats returned by the watsonx Orchestrate /models endpoint:
PYTHON
provider/model-name
Examples:
  • openai/text-embedding-3-small
  • openai/text-embedding-3-large
  • openai/text-embedding-ada-002
  • watsonx/ibm/slate-30m-english-rtrvr
  • Embeddings provides a drop-in replacement for embeddings usage in LangChain-based agents.
  • The model ID must follow the format returned by the platform.
  • Authentication and request routing are handled through the SDK interface.

References