Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developer.watson-orchestrate.ibm.com/llms.txt

Use this file to discover all available pages before exploring further.

Performance Testing ApproachThis guide focuses on how to measure and optimize Tool performance. Performance varies significantly based on workload, configuration, system load, and network conditions. Always measure in your own environment.
Note: “Flow” in this document refers to wxO Agentic Workflow

Overview

Tools are executable units in watsonx Orchestrate that perform specific tasks (API calls, data processing, calculations) within isolated, secure environments. They are invoked by Agents or orchestrated by Flows.

Tool Types in wxO

Choosing the Right Tool Type

wxO Flow (first choice): Native support for connections/security, Python code snippets, orchestration of wxO agents, tools and people (e.g. confirmation before transaction, custom forms), built-in document extraction and processing, LLM support. Python Tools: For custom libraries not in wxO Flow code blocks Langflow Tools: For Langflow-specific AI components or existing langflow flows

Tool Type Comparison

Tool TypeRuntime/BaseTimeoutKey FeaturesBest For
wxO FlowVisual workflowNoneStateful, resumable, orchestrates Agents/Tools/People, LLM supportMulti-step workflows, API integrations, human-in-loop, simple document processing
PythonPython 3.122 minStateless, read-only FS, outbound network onlyData processing, business logic, calculations
LangflowLangflow 1.7.12 minStateless, read-only FS, outbound network only, 2+ sec initializationLLM processing, RAG, complex document analysis
APIOpenAPI spec2 min (sync) / None (async)External REST APIs, network-dependentThird-party integrations, external services
MCPMCP protocol2 minExtended capabilities, varies by implementationCustom integrations, protocol-based interactions

Understanding Tool Performance

The Granularity Principle

Key Insight: Tool performance has two distinct components:
  1. Tool Call Overhead: The cost of invoking a tool
    • Includes initialization, context setup, and result handling
    • Exists for every tool invocation
    • Relatively consistent per tool type
  2. Execution Inside the Tool: The actual work being done
    • Runs fast once inside the tool
    • Varies based on the logic and operations
    • Where optimization efforts should focus
Implication:
  • Multiple small tool calls = Multiple overhead costs
  • Single larger tool call = One overhead cost, fast internal execution
  • Recommendation: Combine related operations into single tools when possible

Performance Characteristics by Tool Type

Python Tools:
  • Call overhead: Present but minimal
  • Internal execution: Very fast for most operations
  • Overall: Fast to very fast
  • Best for: Deterministic logic, data processing, API calls
Langflow Tools:
  • Call overhead: Includes Langflow initialization (2+ seconds due to initialization)
  • Internal execution: Depends on LLM operations
  • Overall: Slower due to LLM inference (if using LLM)
  • Best for: LLM reasoning, NLP tasks, document analysis

Python & Langflow Tool Performance

Shared Technical Constraints

Both Python and Langflow tools operate in secure sandboxes with identical constraints:
ConstraintImpactBest Practice
Isolated pod executionEach tool instance runs in a separate pod for tenant isolation and securityDesign for stateless, independent execution
2 CPU cores maximumLimited computational resources per tool instanceOptimize algorithms and avoid CPU-intensive operations
2GB memory maximumLimited memory per tool instanceManage memory efficiently, avoid large data structures
2-minute timeoutMaximum execution time per callDesign for timeout awareness, break long operations into chunks
No GPU accessGPU operations will failUse CPU-optimized algorithms and libraries only
Cold start penaltyFirst run takes longer; subsequent runs within 72 hours are fasterExpect initial latency; warm pods improve performance
Stateless executionNo state persists between callsUse external storage (Redis, S3, database)
Read-only filesystemCannot write files locallyUse in-memory buffers or external storage
Network isolationOutbound requests onlyDesign for outbound-only patterns
Security Note: Due to tenant isolation requirements and the potentially unsecured nature of user-authored code, wxO runs each Python and Langflow tool instance as a separate pod with strict resource limits. This ensures security and prevents resource contention between tenants.
Performance Note: Tool pods experience cold start latency on first invocation. Once warmed up, the pod remains available for approximately 6 hours, with the time extended upon continuous use of the tool. This provides faster execution for subsequent calls. Plan for initial latency in performance testing and user experience design.

Python Tools

Execution Speed: Very fast (simple operations), fast (data processing), variable (external API calls) Optimization Strategies:
  • Minimize and batch external API calls
  • Use efficient algorithms (O(n) vs O(n²))
  • Choose performant libraries
  • Implement external caching (Redis) for expensive operations

Overview

Tool builders in IBM watsonx Orchestrate can build their tools by using Python and deploy them as toolkits on the platform. The platform now provides a toolkit runtime architecture that offers significant improvements over the individual tool import architecture. This document provides guidelines for designing, developing, and deploying Python tools for watsonx Orchestrate.
Key changes in Python toolkits
  • New toolkit runtime: Overcomes individual tool processing overheads.
  • Improved performance: Improves resource utilization and reduces cold start latency.

New toolkit runtime architecture

Architecture overview
Both individual Python tools and Python toolkits run in isolated containers, but with fundamentally different execution models: Individual Python tool approach:
  • Each tool has its own requirements.txt with specific dependencies
  • Process-level isolation: Each tool invocation creates a new process, albeit a lightweight one.
  • Thread-safety: Tools need not be thread-safe.
  • Performance overhead: Creates a lightweight process for each execution and a virtual environment for tools that are not yet loaded in the worker.
  • Resource allocation: 2 vCPU and 2 GB RAM per replica (2 replicas provided)
  • Cold-start delays: Significant latency from process creation and dependency loading.
Python toolkit runtime approach:
  • Multiple related tools packaged together in a single toolkit.
  • Single shared requirements.txt for all tools in the toolkit.
  • Thread-level isolation: Tools run in persistent worker threads.
  • Preinstalled dependencies: All dependencies are loaded once at container startup.
  • Thread-safety required: All tools must be thread-safe and reentrant.
  • High performance: No process creation or dependency loading overhead.
  • Resource allocation: 2 vCPU and 2 GiB RAM per replica (2 replicas provided).
  • FastAPI/Gunicorn workers: 5 persistent workers per replica handle concurrent requests.
Deployment architecture
Each toolkit deployment consists of:
  • Kubernetes deployment: 2 replicas
  • Per-Replica resources:
    • 2 vCPU cores
    • 2 GiB memory
    • 5 FastAPI/Gunicorn workers per replica
  • Load balancing: Automatic distribution across replicas and workers
  • Health monitoring: Built-in health checks and automatic recovery
Benefits of toolkit approach
  1. Higher performance:
    • No process creation overhead per execution.
    • No dynamic dependency loading.
    • Persistent workers eliminate cold start delays.
    • Thread-based execution is orders of magnitude faster than process-based.
  2. Shared dependencies:
    • Single requirements.txt for all tools in the toolkit
    • Dependencies loaded once at container startup
    • Reduced memory footprint through shared libraries
  3. Better resource utilization:
    • Multiple tools share container resources
    • Efficient thread-based concurrency
    • Lower per-request overhead
  4. Improved scalability:
    • Each replica handles multiple concurrent requests through workers
    • Better throughput per unit of compute
  5. Enhanced reliability:
    • Built-in health checks and automatic failover
    • Persistent workers reduce failure points
    • No dependency loading failures during execution
  6. Cost efficiency:
    • More tools per unit of compute resource
    • Reduced infrastructure overhead
    • Better resource density
Thread safety requirements
Critical: All tools in a toolkit must be thread-safe because:
  • Multiple worker threads run tools concurrently
  • No process-level isolation between tool invocations
  • Shared memory space within the container
Thread-safe practices:
PYTHON
# ✅ CORRECT: Thread-safe tool (no shared mutable state)
@tool
async def thread_safe_tool(user_id: str) -> Dict:
    """Each invocation uses local variables only"""
    async with httpx.AsyncClient() as client:
        response = await client.get(f"https://api.example.com/users/{user_id}")
        return response.json()

# ❌ INCORRECT: Not thread-safe (shared mutable state)
_cache = {}  # Shared across all threads!

@tool
async def not_thread_safe(key: str) -> str:
    """Dangerous: modifying shared state without locks"""
    if key not in _cache:
        _cache[key] = await fetch_data(key)  # Race condition!
    return _cache[key]

  • A toolkit in draft is run by using the Python tool approach to cover a large number of tools and toolkits that might be present in a tenant in the draft environment.
  • Toolkits in live run with thread safety and a separate deployment per toolkit.

Resource allocation and performance requirements

CPU computation limits
Critical requirement: Python tools must limit CPU-bound computation to approximately 50 milliseconds and use async I/O for all I/O operations to ensure performance scaling. This constraint helps ensure:
  • Nonblocking execution in the event loop
  • Fair resource sharing across concurrent requests
  • Predictable response times
  • Prevention of worker starvation
What counts as CPU time
CPU time includes:
  • Data parsing and transformation logic
  • JSON serialization/deserialization
  • String manipulation and formatting
  • Mathematical computations
  • In-memory data filtering and sorting
  • Object instantiation and manipulation
CPU time does not include:
  • Network I/O wait time (HTTP requests, database queries)
  • File I/O wait time
  • Sleep/delay operations
  • External API call latency
Memory constraints
Each replica has 2 GiB of memory that is shared across 5 workers:
  • Per-worker budget: ~400 MB (accounting for overhead).
  • Tool memory usage: Cannot not exceed 100-200 MB per concurrent request.
  • Memory leaks: Avoid memory leaks to prevent pod crashes.
  • Throttling: Implement request throttling if memory usage approaches limits.

Python tool design principles

1. Tools as API wrappers
Python tools must be lightweight wrappers around existing API implementations:
PYTHON
from ibm_watsonx_orchestrate.agent_builder.tools import tool
import httpx
from typing import Dict, List

@tool
async def get_customer_orders(customer_id: str) -> Dict:
    """
    Fetch customer orders from multiple services and format the response.
    
    This tool demonstrates the correct pattern:
    - Heavy lifting done by external APIs
    - Async I/O for non-blocking execution
    - Minimal CPU time for response formatting
    """
    async with httpx.AsyncClient() as client:
        # API call 1: Get customer profile (async, non-blocking)
        profile_response = await client.get(
            f"https://api.example.com/customers/{customer_id}"
        )
        profile = profile_response.json()
        
        # API call 2: Get order history (async, non-blocking)
        orders_response = await client.get(
            f"https://api.example.com/orders?customer_id={customer_id}"
        )
        orders = orders_response.json()
        
        # API call 3: Get loyalty points (async, non-blocking)
        loyalty_response = await client.get(
            f"https://api.example.com/loyalty/{customer_id}"
        )
        loyalty = loyalty_response.json()
        
        # API call 4: Get recommendations (async, non-blocking)
        recs_response = await client.get(
            f"https://api.example.com/recommendations/{customer_id}"
        )
        recommendations = recs_response.json()
        
        # CPU work: Format and combine responses (~10-20ms)
        result = {
            "customer_name": profile.get("name"),
            "email": profile.get("email"),
            "total_orders": len(orders.get("items", [])),
            "recent_orders": orders.get("items", [])[:5],
            "loyalty_points": loyalty.get("points", 0),
            "tier": loyalty.get("tier", "standard"),
            "recommended_products": [
                {
                    "id": rec["product_id"],
                    "name": rec["product_name"],
                    "score": rec["relevance_score"]
                }
                for rec in recommendations.get("items", [])[:3]
            ]
        }
        
        return result
2. Mandatory async I/O
All I/O operations must use async/await patterns to prevent blocking the event loop:
PYTHON
# ✅ CORRECT: Async I/O
@tool
async def fetch_data(url: str) -> dict:
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        return response.json()

# ❌ INCORRECT: Blocking I/O
@tool
def fetch_data_blocking(url: str) -> dict:
    import requests
    response = requests.get(url)  # Blocks the event loop!
    return response.json()
Why async matters:
  • Allows the worker to handle other requests while it waits for the I/O
  • Enables high concurrency with limited resources
  • Prevents worker starvation
  • Maximizes throughput
3. Response formatting guidelines
Keep CPU-intensive formatting minimal (target: 10-50ms):
PYTHON
# ✅ CORRECT: Minimal formatting
@tool
async def process_api_response(data: dict) -> dict:
    """Simple field extraction and renaming"""
    return {
        "id": data.get("customer_id"),
        "name": data.get("full_name"),
        "status": data.get("account_status", "active")
    }

# ⚠️ CAUTION: More complex but still acceptable
@tool
async def aggregate_metrics(data: List[dict]) -> dict:
    """Light aggregation - keep under 50ms"""
    total = sum(item.get("value", 0) for item in data)
    avg = total / len(data) if data else 0
    return {
        "total": total,
        "average": avg,
        "count": len(data)
    }

# ❌ INCORRECT: Heavy computation
@tool
async def complex_analysis(data: List[dict]) -> dict:
    """This will exceed 50ms CPU time"""
    # Heavy statistical analysis
    # Machine learning inference
    # Complex data transformations
    # These should be done by external APIs!
    pass

Performance analysis and capacity planning

Example scenario: high-throughput tool
The following example shows a tool that makes 4 asynchronous API calls: Tool characteristics:
  • CPU time per request: 50 ms (formatting/parsing)
  • API call latency: 15 seconds each (4 calls in parallel via async)
  • Total wall-clock time: ~15 seconds (parallel execution)
Deployment configuration:
  • 2 Kubernetes replicas
  • 2 vCPU per replica
  • 5 workers per replica
  • Total workers: 10 (2 replicas × 5 workers)
Throughput calculation
CPU capacity & practical throughput:
  • Total vCPUs: 4 (2 replicas × 2 vCPU)
  • CPU time per request: 50 ms = 0.05 seconds
  • Typical CPU-based throughput: 50 requests/second

When to use platform-hosted Python toolkits

Platform-hosted Python toolkits are good for:
✅ Suitable use cases
  1. API orchestration tools
    • Calling 2-5 external APIs and combining results
    • Response formatting and field mapping
    • Simple data aggregation from multiple sources
  2. Data transformation tools
    • JSON/XML format conversions
    • Field extraction and renaming
    • Simple filtering and sorting
    • Data validation and sanitization
  3. Integration wrappers
    • Wrapping REST APIs with simplified interfaces
    • Authentication token management
    • Request/response normalization
  4. Lightweight processing
    • Text formatting and templating
    • Date/time conversions
    • Unit conversions
    • Simple calculations (< 50 ms CPU)
Requirements checklist
Use platform-hosted toolkits when ALL of the following are true:
  • CPU computation ≤ 50 ms per request
  • All I/O operations use async/await
  • Memory usage is less than 200 MB per request
  • Heavy processing is delegated to external APIs
  • Response time is acceptable at 5-30 seconds
  • The tool is stateless or uses platform context management
  • No special runtime dependencies (such as custom C libraries)

When to use self-hosted solutions

Consider self-hosting tools through Remote MCP Servers or OpenAPI Tools when:
❌ Not suitable for platform toolkits
  1. CPU-intensive operations
    • Complex statistical analysis
    • Cryptographic operations
    • Large-scale data transformations
  2. High memory requirements
    • Loading large models (> 500 MB)
    • Processing large datasets in memory
    • Caching extensive reference data
    • In-memory databases
  3. Special runtime requirements
    • Custom system libraries
    • Specific Python versions
    • Native code dependencies
  4. Long-running operations
    • Batch processing jobs
    • Report generation (> 60 seconds)
    • Data migration tasks
    • Background processing
  5. Stateful operations
    • Maintaining persistent connections
    • Session management
    • Transaction coordination
    • Workflow orchestration
Self-hosting options
Option 1: Remote MCP server
  • Best for: Custom tool collections with complex logic
  • Deployment: Your infrastructure
  • Protocol: Model Context Protocol over HTTP/SSE
  • Benefits: Full control, custom resources, flexible scaling
Option 2: OpenAPI tools
  • Best for: Existing REST APIs
  • Deployment: Your infrastructure or third-party services
  • Protocol: Standard HTTP/REST
  • Benefits: Use existing APIs, no code changes needed
Option 3: Hybrid approach
  • Platform toolkits for orchestration and formatting
  • Self-hosted services for heavy computation
  • Example: Platform tool calls your ML inference API

Best practices

1. Async I/O patterns
PYTHON
import httpx
import asyncio
from typing import List, Dict

@tool
async def parallel_api_calls(ids: List[str]) -> List[Dict]:
    """Efficiently fetch data for multiple IDs in parallel"""
    async with httpx.AsyncClient() as client:
        # Create tasks for parallel execution
        tasks = [
            client.get(f"https://api.example.com/items/{id}")
            for id in ids
        ]
        # Wait for all to complete
        responses = await asyncio.gather(*tasks)
        # Format results (minimal CPU time)
        return [r.json() for r in responses]
2. Error handling
PYTHON
@tool
async def robust_api_call(endpoint: str) -> Dict:
    """Proper error handling for external API calls"""
    try:
        async with httpx.AsyncClient(timeout=30.0) as client:
            response = await client.get(endpoint)
            response.raise_for_status()
            return response.json()
    except httpx.TimeoutException:
        return {"error": "API timeout", "status": "timeout"}
    except httpx.HTTPStatusError as e:
        return {"error": f"API error: {e.response.status_code}", "status": "error"}
    except Exception as e:
        return {"error": str(e), "status": "error"}
3. Connection pooling
PYTHON
# Module-level client for connection reuse
_http_client = None

async def get_http_client():
    """Reuse HTTP client across requests"""
    global _http_client
    if _http_client is None:
        _http_client = httpx.AsyncClient(
            timeout=30.0,
            limits=httpx.Limits(max_keepalive_connections=20)
        )
    return _http_client

@tool
async def efficient_api_call(url: str) -> Dict:
    """Use connection pooling for better performance"""
    client = await get_http_client()
    response = await client.get(url)
    return response.json()
4. Memory management
PYTHON
@tool
async def memory_efficient_processing(data_url: str) -> Dict:
    """Stream large responses instead of loading into memory"""
    async with httpx.AsyncClient() as client:
        async with client.stream('GET', data_url) as response:
            # Process in chunks
            total_size = 0
            async for chunk in response.aiter_bytes(chunk_size=8192):
                total_size += len(chunk)
                # Process chunk without loading entire response
            
            return {"processed_bytes": total_size}
5. Context management
PYTHON
from ibm_watsonx_orchestrate.agent_builder.tools import tool
from ibm_watsonx_orchestrate.utils import update_context
from ibm_watsonx_orchestrate.run.context import AgentRun

@tool
async def stateful_tool(user_input: str, context: AgentRun) -> str:
    """Use platform context for state management"""
    # Read previous state
    previous_step = context.request_context.get('workflow_step', 'start')
    
    # Perform operation
    result = await process_step(user_input, previous_step)
    
    # Update context for next tool
    update_context('workflow_step', result['next_step'])
    update_context('last_result', result['data'])
    
    return result['message']

Migration from individual tools to toolkits

If you have existing individual tools, consider migrating to the toolkit approach:
Migration checklist
  1. Group-related tools:
    • Identify tools that share similar dependencies
    • Group tools by functional domain or use case
    • Aim for 10-15 tools per toolkit for optimal resource utilization
  2. Consolidate dependencies:
    • Merge all requirements.txt files into a single file
    • Resolve version conflicts between tools
    • Remove duplicate dependencies
    • Test compatibility of consolidated dependencies
  3. Ensure thread safety:
    • Critical: Review all tools for thread-safety issues
    • Remove or protect a shared mutable state
    • Use thread-safe data structures (queue.Queue, threading.Lock)
    • Avoid global variables that are modified during execution
    • Test tools under concurrent load
  4. Review CPU usage:
    • Profile each tool to help ensure CPU time < 50 ms
    • Identify and optimize CPU-intensive operations
    • Consider moving heavy computation to external APIs
  5. Convert to async:
    • Replace all blocking I/O with async/await
    • Use async-compatible libraries (httpx, aiohttp, asyncpg)
    • Ensure all I/O operations are nonblocking
  6. Test concurrency:
    • Load test with multiple concurrent requests
    • Verify no race conditions or deadlocks
    • Monitor memory usage under load
    • Validate thread-safety under stress
  7. Package as toolkit:
    • Follow toolkit packaging format
    • Create single requirements.txt
    • Document thread-safety guarantees
    • Include toolkit metadata

Summary

The new toolkit runtime architecture provides a high-performance, scalable platform for Python tools that:
  • Eliminates cold start delays through persistent workers
  • Maximizes throughput through async I/O and efficient resource sharing
  • Ensures predictability through strict CPU time limits
Key takeaway: Design tools as lightweight, async wrappers around external APIs. Keep CPU computation under 50 ms, use async I/O for all network operations, and let external services handle heavy processing. This approach enables high concurrency, predictable performance, and efficient resource utilization. For tools that don’t fit these constraints, use self-hosted solutions through Remote MCP Servers or OpenAPI Tools to maintain full control over resources and runtime environment.

Langflow Tools

Execution Speed: Minimum 2+ seconds (initialization overhead per run), then variable depending on workflow complexity Key Consideration: Due to the initialization overhead on every run, Langflow tools are most effective for operations that take longer to execute (but remain under the 2-minute timeout). The initialization penalty becomes less significant when the actual workflow processing time is substantial.
Note: LLM operations in Langflow are optional. Use Langflow when you need its specific AI components or have existing Langflow workflows, not solely for LLM capabilities.
Optimization Strategies:
  • Use for longer-running operations where initialization overhead is proportionally smaller
  • Minimize and combine LLM calls into single prompts (when using LLMs)
  • Use concise, focused prompts with minimal context (when using LLMs)
  • Choose smaller models for simple tasks, larger for complex reasoning (when using LLMs)
  • Cache expensive operation results externally (Redis) for repeated queries

API & MCP Tool Performance

API Tools (OpenAPI-based)

Performance: Depends entirely on external service (network latency, service speed, payload size, rate limiting, authentication) Timeout:
  • Synchronous calls: 2 minutes (same as Python/Langflow/MCP)
  • Asynchronous calls: None (via OpenAPI callback syntax)

MCP Tools (Model Context Protocol)

Performance: Varies by implementation (tool design, external dependencies, protocol overhead, resource requirements) Timeout: 2 minutes (same as Python/Langflow/synchronous API tools; MCP protocol does not support async)

Summary

Key Points:
  • wxO Flow: First choice for most scenarios - supports connections, security, custom code, no timeout limit
  • Python Tools: For custom libraries not supported in wxO Flow code blocks
  • Langflow Tools: For Langflow-specific AI components
  • API Tools: For OpenAPI-based external service integrations - performance depends on external service
  • MCP Tools: For Model Context Protocol capabilities - performance varies by implementation
  • Tool call overhead exists: Combine operations when possible
  • 2-minute timeout: Applies to synchrous tools such as Python, Langflow, OpenAPI sync and MCP tools

Related Guides: