Tool Performance Guide - IBM watsonx Orchestrate ADK

Part of: watsonx Orchestrate Performance Guide

Performance Testing ApproachThis guide focuses on how to measure and optimize Tool performance. Performance varies significantly based on workload, configuration, system load, and network conditions. Always measure in your own environment.

Note: “Flow” in this document refers to wxO Agentic Workflow

Overview

Tools are executable units in watsonx Orchestrate that perform specific tasks (API calls, data processing, calculations) within isolated, secure environments. They are invoked by Agents or orchestrated by Flows.

Tool Types in wxO

Choosing the Right Tool Type

wxO Flow (first choice): Native support for connections/security, Python code snippets, orchestration of wxO agents, tools and people (e.g. confirmation before transaction, custom forms), built-in document extraction and processing, LLM support. Python Tools: For custom libraries not in wxO Flow code blocks Langflow Tools: For Langflow-specific AI components or existing langflow flows

Tool Type Comparison

Tool Type	Runtime/Base	Timeout	Key Features	Best For
wxO Flow	Visual workflow	None	Stateful, resumable, orchestrates Agents/Tools/People, LLM support	Multi-step workflows, API integrations, human-in-loop, simple document processing
Python	Python 3.12	2 min	Stateless, read-only FS, outbound network only	Data processing, business logic, calculations
Langflow	Langflow 1.7.1	2 min	Stateless, read-only FS, outbound network only, 2+ sec initialization	LLM processing, RAG, complex document analysis
API	OpenAPI spec	2 min (sync) / None (async)	External REST APIs, network-dependent	Third-party integrations, external services
MCP	MCP protocol	2 min	Extended capabilities, varies by implementation	Custom integrations, protocol-based interactions

Understanding Tool Performance

The Granularity Principle

Key Insight: Tool performance has two distinct components:

Tool Call Overhead: The cost of invoking a tool
- Includes initialization, context setup, and result handling
- Exists for every tool invocation
- Relatively consistent per tool type
Execution Inside the Tool: The actual work being done
- Runs fast once inside the tool
- Varies based on the logic and operations
- Where optimization efforts should focus

Implication:

Multiple small tool calls = Multiple overhead costs
Single larger tool call = One overhead cost, fast internal execution
Recommendation: Combine related operations into single tools when possible

Performance Characteristics by Tool Type

Python Tools:

Call overhead: Present but minimal
Internal execution: Very fast for most operations
Overall: Fast to very fast
Best for: Deterministic logic, data processing, API calls

Langflow Tools:

Call overhead: Includes Langflow initialization (2+ seconds due to initialization)
Internal execution: Depends on LLM operations
Overall: Slower due to LLM inference (if using LLM)
Best for: LLM reasoning, NLP tasks, document analysis

Python & Langflow Tool Performance

Shared Technical Constraints

Both Python and Langflow tools operate in secure sandboxes with identical constraints:

Constraint	Impact	Best Practice
Isolated pod execution	Each tool instance runs in a separate pod for tenant isolation and security	Design for stateless, independent execution
2 CPU cores maximum	Limited computational resources per tool instance	Optimize algorithms and avoid CPU-intensive operations
2GB memory maximum	Limited memory per tool instance	Manage memory efficiently, avoid large data structures
2-minute timeout	Maximum execution time per call	Design for timeout awareness, break long operations into chunks
No GPU access	GPU operations will fail	Use CPU-optimized algorithms and libraries only
Cold start penalty	First run takes longer; subsequent runs within 72 hours are faster	Expect initial latency; warm pods improve performance
Stateless execution	No state persists between calls	Use external storage (Redis, S3, database)
Read-only filesystem	Cannot write files locally	Use in-memory buffers or external storage
Network isolation	Outbound requests only	Design for outbound-only patterns

Security Note: Due to tenant isolation requirements and the potentially unsecured nature of user-authored code, wxO runs each Python and Langflow tool instance as a separate pod with strict resource limits. This ensures security and prevents resource contention between tenants.

Performance Note: Tool pods experience cold start latency on first invocation. Once warmed up, the pod remains available for approximately 6 hours, with the time extended upon continuous use of the tool. This provides faster execution for subsequent calls. Plan for initial latency in performance testing and user experience design.

Python Tools

Execution Speed: Very fast (simple operations), fast (data processing), variable (external API calls) Optimization Strategies:

Minimize and batch external API calls
Use efficient algorithms (O(n) vs O(n²))
Choose performant libraries
Implement external caching (Redis) for expensive operations

Overview

Tool builders in IBM watsonx Orchestrate can build their tools by using Python and deploy them as toolkits on the platform. The platform now provides a toolkit runtime architecture that offers significant improvements over the individual tool import architecture. This document provides guidelines for designing, developing, and deploying Python tools for watsonx Orchestrate.

Key changes in Python toolkits

New toolkit runtime: Overcomes individual tool processing overheads.
Improved performance: Improves resource utilization and reduces cold start latency.

New toolkit runtime architecture

Architecture overview

Both individual Python tools and Python toolkits run in isolated containers, but with fundamentally different execution models: Individual Python tool approach:

Each tool has its own requirements.txt with specific dependencies
Process-level isolation: Each tool invocation creates a new process, albeit a lightweight one.
Thread-safety: Tools need not be thread-safe.
Performance overhead: Creates a lightweight process for each execution and a virtual environment for tools that are not yet loaded in the worker.
Resource allocation: 2 vCPU and 2 GB RAM per replica (2 replicas provided)
Cold-start delays: Significant latency from process creation and dependency loading.

Python toolkit runtime approach:

Multiple related tools packaged together in a single toolkit.
Single shared requirements.txt for all tools in the toolkit.
Thread-level isolation: Tools run in persistent worker threads.
Preinstalled dependencies: All dependencies are loaded once at container startup.
Thread-safety required: All tools must be thread-safe and reentrant.
High performance: No process creation or dependency loading overhead.
Resource allocation: 2 vCPU and 2 GiB RAM per replica (2 replicas provided).
FastAPI/Gunicorn workers: 5 persistent workers per replica handle concurrent requests.

Deployment architecture

Each toolkit deployment consists of:

Kubernetes deployment: 2 replicas
Per-Replica resources:
- 2 vCPU cores
- 2 GiB memory
- 5 FastAPI/Gunicorn workers per replica
Load balancing: Automatic distribution across replicas and workers
Health monitoring: Built-in health checks and automatic recovery

Benefits of toolkit approach

Higher performance:
- No process creation overhead per execution.
- No dynamic dependency loading.
- Persistent workers eliminate cold start delays.
- Thread-based execution is orders of magnitude faster than process-based.
Shared dependencies:
- Single requirements.txt for all tools in the toolkit
- Dependencies loaded once at container startup
- Reduced memory footprint through shared libraries
Better resource utilization:
- Multiple tools share container resources
- Efficient thread-based concurrency
- Lower per-request overhead
Improved scalability:
- Each replica handles multiple concurrent requests through workers
- Better throughput per unit of compute
Enhanced reliability:
- Built-in health checks and automatic failover
- Persistent workers reduce failure points
- No dependency loading failures during execution
Cost efficiency:
- More tools per unit of compute resource
- Reduced infrastructure overhead
- Better resource density

Thread safety requirements

Critical: All tools in a toolkit must be thread-safe because:

Multiple worker threads run tools concurrently
No process-level isolation between tool invocations
Shared memory space within the container

Thread-safe practices:

PYTHON

# ✅ CORRECT: Thread-safe tool (no shared mutable state)
@tool
async def thread_safe_tool(user_id: str) -> Dict:
    """Each invocation uses local variables only"""
    async with httpx.AsyncClient() as client:
        response = await client.get(f"https://api.example.com/users/{user_id}")
        return response.json()

# ❌ INCORRECT: Not thread-safe (shared mutable state)
_cache = {}  # Shared across all threads!

@tool
async def not_thread_safe(key: str) -> str:
    """Dangerous: modifying shared state without locks"""
    if key not in _cache:
        _cache[key] = await fetch_data(key)  # Race condition!
    return _cache[key]

A toolkit in draft is run by using the Python tool approach to cover a large number of tools and toolkits that might be present in a tenant in the draft environment.
Toolkits in live run with thread safety and a separate deployment per toolkit.

Resource allocation and performance requirements

CPU computation limits

Critical requirement: Python tools must limit CPU-bound computation to approximately 50 milliseconds and use async I/O for all I/O operations to ensure performance scaling. This constraint helps ensure:

Nonblocking execution in the event loop
Fair resource sharing across concurrent requests
Predictable response times
Prevention of worker starvation

What counts as CPU time

CPU time includes:

Data parsing and transformation logic
JSON serialization/deserialization
String manipulation and formatting
Mathematical computations
In-memory data filtering and sorting
Object instantiation and manipulation

CPU time does not include:

Network I/O wait time (HTTP requests, database queries)
File I/O wait time
Sleep/delay operations
External API call latency

Memory constraints

Each replica has 2 GiB of memory that is shared across 5 workers:

Per-worker budget: ~400 MB (accounting for overhead).
Tool memory usage: Cannot not exceed 100-200 MB per concurrent request.
Memory leaks: Avoid memory leaks to prevent pod crashes.
Throttling: Implement request throttling if memory usage approaches limits.

Python tool design principles

1. Tools as API wrappers

Python tools must be lightweight wrappers around existing API implementations:

PYTHON

from ibm_watsonx_orchestrate.agent_builder.tools import tool
import httpx
from typing import Dict, List

@tool
async def get_customer_orders(customer_id: str) -> Dict:
    """
    Fetch customer orders from multiple services and format the response.
    
    This tool demonstrates the correct pattern:
    - Heavy lifting done by external APIs
    - Async I/O for non-blocking execution
    - Minimal CPU time for response formatting
    """
    async with httpx.AsyncClient() as client:
        # API call 1: Get customer profile (async, non-blocking)
        profile_response = await client.get(
            f"https://api.example.com/customers/{customer_id}"
        )
        profile = profile_response.json()
        
        # API call 2: Get order history (async, non-blocking)
        orders_response = await client.get(
            f"https://api.example.com/orders?customer_id={customer_id}"
        )
        orders = orders_response.json()
        
        # API call 3: Get loyalty points (async, non-blocking)
        loyalty_response = await client.get(
            f"https://api.example.com/loyalty/{customer_id}"
        )
        loyalty = loyalty_response.json()
        
        # API call 4: Get recommendations (async, non-blocking)
        recs_response = await client.get(
            f"https://api.example.com/recommendations/{customer_id}"
        )
        recommendations = recs_response.json()
        
        # CPU work: Format and combine responses (~10-20ms)
        result = {
            "customer_name": profile.get("name"),
            "email": profile.get("email"),
            "total_orders": len(orders.get("items", [])),
            "recent_orders": orders.get("items", [])[:5],
            "loyalty_points": loyalty.get("points", 0),
            "tier": loyalty.get("tier", "standard"),
            "recommended_products": [
                {
                    "id": rec["product_id"],
                    "name": rec["product_name"],
                    "score": rec["relevance_score"]
                }
                for rec in recommendations.get("items", [])[:3]
            ]
        }
        
        return result

2. Mandatory async I/O

All I/O operations must use async/await patterns to prevent blocking the event loop:

PYTHON

# ✅ CORRECT: Async I/O
@tool
async def fetch_data(url: str) -> dict:
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        return response.json()

# ❌ INCORRECT: Blocking I/O
@tool
def fetch_data_blocking(url: str) -> dict:
    import requests
    response = requests.get(url)  # Blocks the event loop!
    return response.json()

Why async matters:

Allows the worker to handle other requests while it waits for the I/O
Enables high concurrency with limited resources
Prevents worker starvation
Maximizes throughput

3. Response formatting guidelines

Keep CPU-intensive formatting minimal (target: 10-50ms):

PYTHON

# ✅ CORRECT: Minimal formatting
@tool
async def process_api_response(data: dict) -> dict:
    """Simple field extraction and renaming"""
    return {
        "id": data.get("customer_id"),
        "name": data.get("full_name"),
        "status": data.get("account_status", "active")
    }

# ⚠️ CAUTION: More complex but still acceptable
@tool
async def aggregate_metrics(data: List[dict]) -> dict:
    """Light aggregation - keep under 50ms"""
    total = sum(item.get("value", 0) for item in data)
    avg = total / len(data) if data else 0
    return {
        "total": total,
        "average": avg,
        "count": len(data)
    }

# ❌ INCORRECT: Heavy computation
@tool
async def complex_analysis(data: List[dict]) -> dict:
    """This will exceed 50ms CPU time"""
    # Heavy statistical analysis
    # Machine learning inference
    # Complex data transformations
    # These should be done by external APIs!
    pass

Performance analysis and capacity planning

Example scenario: high-throughput tool

The following example shows a tool that makes 4 asynchronous API calls: Tool characteristics:

CPU time per request: 50 ms (formatting/parsing)
API call latency: 15 seconds each (4 calls in parallel via async)
Total wall-clock time: ~15 seconds (parallel execution)

Deployment configuration:

2 Kubernetes replicas
2 vCPU per replica
5 workers per replica
Total workers: 10 (2 replicas × 5 workers)

Throughput calculation

CPU capacity & practical throughput:

Total vCPUs: 4 (2 replicas × 2 vCPU)
CPU time per request: 50 ms = 0.05 seconds
Typical CPU-based throughput: 50 requests/second

When to use platform-hosted Python toolkits

Platform-hosted Python toolkits are good for:

✅ Suitable use cases

API orchestration tools
- Calling 2-5 external APIs and combining results
- Response formatting and field mapping
- Simple data aggregation from multiple sources
Data transformation tools
- JSON/XML format conversions
- Field extraction and renaming
- Simple filtering and sorting
- Data validation and sanitization
Integration wrappers
- Wrapping REST APIs with simplified interfaces
- Authentication token management
- Request/response normalization
Lightweight processing
- Text formatting and templating
- Date/time conversions
- Unit conversions
- Simple calculations (< 50 ms CPU)

Requirements checklist

Use platform-hosted toolkits when ALL of the following are true:

CPU computation ≤ 50 ms per request
All I/O operations use async/await
Memory usage is less than 200 MB per request
Heavy processing is delegated to external APIs
Response time is acceptable at 5-30 seconds
The tool is stateless or uses platform context management
No special runtime dependencies (such as custom C libraries)

When to use self-hosted solutions

Consider self-hosting tools through Remote MCP Servers or OpenAPI Tools when:

❌ Not suitable for platform toolkits

CPU-intensive operations
- Complex statistical analysis
- Cryptographic operations
- Large-scale data transformations
High memory requirements
- Loading large models (> 500 MB)
- Processing large datasets in memory
- Caching extensive reference data
- In-memory databases
Special runtime requirements
- Custom system libraries
- Specific Python versions
- Native code dependencies
Long-running operations
- Batch processing jobs
- Report generation (> 60 seconds)
- Data migration tasks
- Background processing
Stateful operations
- Maintaining persistent connections
- Session management
- Transaction coordination
- Workflow orchestration

Self-hosting options

Option 1: Remote MCP server

Best for: Custom tool collections with complex logic
Deployment: Your infrastructure
Protocol: Model Context Protocol over HTTP/SSE
Benefits: Full control, custom resources, flexible scaling

Option 2: OpenAPI tools

Best for: Existing REST APIs
Deployment: Your infrastructure or third-party services
Protocol: Standard HTTP/REST
Benefits: Use existing APIs, no code changes needed

Option 3: Hybrid approach

Platform toolkits for orchestration and formatting
Self-hosted services for heavy computation
Example: Platform tool calls your ML inference API

Best practices

1. Async I/O patterns

PYTHON

import httpx
import asyncio
from typing import List, Dict

@tool
async def parallel_api_calls(ids: List[str]) -> List[Dict]:
    """Efficiently fetch data for multiple IDs in parallel"""
    async with httpx.AsyncClient() as client:
        # Create tasks for parallel execution
        tasks = [
            client.get(f"https://api.example.com/items/{id}")
            for id in ids
        ]
        # Wait for all to complete
        responses = await asyncio.gather(*tasks)
        # Format results (minimal CPU time)
        return [r.json() for r in responses]

2. Error handling

PYTHON

@tool
async def robust_api_call(endpoint: str) -> Dict:
    """Proper error handling for external API calls"""
    try:
        async with httpx.AsyncClient(timeout=30.0) as client:
            response = await client.get(endpoint)
            response.raise_for_status()
            return response.json()
    except httpx.TimeoutException:
        return {"error": "API timeout", "status": "timeout"}
    except httpx.HTTPStatusError as e:
        return {"error": f"API error: {e.response.status_code}", "status": "error"}
    except Exception as e:
        return {"error": str(e), "status": "error"}

3. Connection pooling

PYTHON

# Module-level client for connection reuse
_http_client = None

async def get_http_client():
    """Reuse HTTP client across requests"""
    global _http_client
    if _http_client is None:
        _http_client = httpx.AsyncClient(
            timeout=30.0,
            limits=httpx.Limits(max_keepalive_connections=20)
        )
    return _http_client

@tool
async def efficient_api_call(url: str) -> Dict:
    """Use connection pooling for better performance"""
    client = await get_http_client()
    response = await client.get(url)
    return response.json()

4. Memory management

PYTHON

@tool
async def memory_efficient_processing(data_url: str) -> Dict:
    """Stream large responses instead of loading into memory"""
    async with httpx.AsyncClient() as client:
        async with client.stream('GET', data_url) as response:
            # Process in chunks
            total_size = 0
            async for chunk in response.aiter_bytes(chunk_size=8192):
                total_size += len(chunk)
                # Process chunk without loading entire response
            
            return {"processed_bytes": total_size}

5. Context management

PYTHON

from ibm_watsonx_orchestrate.agent_builder.tools import tool
from ibm_watsonx_orchestrate.utils import update_context
from ibm_watsonx_orchestrate.run.context import AgentRun

@tool
async def stateful_tool(user_input: str, context: AgentRun) -> str:
    """Use platform context for state management"""
    # Read previous state
    previous_step = context.request_context.get('workflow_step', 'start')
    
    # Perform operation
    result = await process_step(user_input, previous_step)
    
    # Update context for next tool
    update_context('workflow_step', result['next_step'])
    update_context('last_result', result['data'])
    
    return result['message']

Migration from individual tools to toolkits

If you have existing individual tools, consider migrating to the toolkit approach:

Migration checklist

Group-related tools:
- Identify tools that share similar dependencies
- Group tools by functional domain or use case
- Aim for 10-15 tools per toolkit for optimal resource utilization
Consolidate dependencies:
- Merge all requirements.txt files into a single file
- Resolve version conflicts between tools
- Remove duplicate dependencies
- Test compatibility of consolidated dependencies
Ensure thread safety:
- Critical: Review all tools for thread-safety issues
- Remove or protect a shared mutable state
- Use thread-safe data structures (queue.Queue, threading.Lock)
- Avoid global variables that are modified during execution
- Test tools under concurrent load
Review CPU usage:
- Profile each tool to help ensure CPU time < 50 ms
- Identify and optimize CPU-intensive operations
- Consider moving heavy computation to external APIs
Convert to async:
- Replace all blocking I/O with async/await
- Use async-compatible libraries (httpx, aiohttp, asyncpg)
- Ensure all I/O operations are nonblocking
Test concurrency:
- Load test with multiple concurrent requests
- Verify no race conditions or deadlocks
- Monitor memory usage under load
- Validate thread-safety under stress
Package as toolkit:
- Follow toolkit packaging format
- Create single requirements.txt
- Document thread-safety guarantees
- Include toolkit metadata

Summary

The new toolkit runtime architecture provides a high-performance, scalable platform for Python tools that:

Eliminates cold start delays through persistent workers
Maximizes throughput through async I/O and efficient resource sharing
Ensures predictability through strict CPU time limits

Key takeaway: Design tools as lightweight, async wrappers around external APIs. Keep CPU computation under 50 ms, use async I/O for all network operations, and let external services handle heavy processing. This approach enables high concurrency, predictable performance, and efficient resource utilization. For tools that don’t fit these constraints, use self-hosted solutions through Remote MCP Servers or OpenAPI Tools to maintain full control over resources and runtime environment.

Langflow Tools

Execution Speed: Minimum 2+ seconds (initialization overhead per run), then variable depending on workflow complexity Key Consideration: Due to the initialization overhead on every run, Langflow tools are most effective for operations that take longer to execute (but remain under the 2-minute timeout). The initialization penalty becomes less significant when the actual workflow processing time is substantial.

Note: LLM operations in Langflow are optional. Use Langflow when you need its specific AI components or have existing Langflow workflows, not solely for LLM capabilities.

Optimization Strategies:

Use for longer-running operations where initialization overhead is proportionally smaller
Minimize and combine LLM calls into single prompts (when using LLMs)
Use concise, focused prompts with minimal context (when using LLMs)
Choose smaller models for simple tasks, larger for complex reasoning (when using LLMs)
Cache expensive operation results externally (Redis) for repeated queries

API & MCP Tool Performance

API Tools (OpenAPI-based)

Performance: Depends entirely on external service (network latency, service speed, payload size, rate limiting, authentication) Timeout:

Synchronous calls: 2 minutes (same as Python/Langflow/MCP)
Asynchronous calls: None (via OpenAPI callback syntax)

MCP Tools (Model Context Protocol)

Performance: Varies by implementation (tool design, external dependencies, protocol overhead, resource requirements) Timeout: 2 minutes (same as Python/Langflow/synchronous API tools; MCP protocol does not support async)

Summary

Key Points:

wxO Flow: First choice for most scenarios - supports connections, security, custom code, no timeout limit
Python Tools: For custom libraries not supported in wxO Flow code blocks
Langflow Tools: For Langflow-specific AI components
API Tools: For OpenAPI-based external service integrations - performance depends on external service
MCP Tools: For Model Context Protocol capabilities - performance varies by implementation
Tool call overhead exists: Combine operations when possible
2-minute timeout: Applies to synchrous tools such as Python, Langflow, OpenAPI sync and MCP tools

Related Guides:

​Overview

​Tool Types in wxO

​Choosing the Right Tool Type

​Tool Type Comparison

​Understanding Tool Performance

​The Granularity Principle

​Performance Characteristics by Tool Type

​Python & Langflow Tool Performance

​Shared Technical Constraints

​Python Tools

​Overview

Key changes in Python toolkits

​New toolkit runtime architecture

Architecture overview

Deployment architecture

Benefits of toolkit approach

Thread safety requirements

​Resource allocation and performance requirements

CPU computation limits

What counts as CPU time

Memory constraints

​Python tool design principles

1. Tools as API wrappers

2. Mandatory async I/O

3. Response formatting guidelines

​Performance analysis and capacity planning

Example scenario: high-throughput tool

Throughput calculation

​When to use platform-hosted Python toolkits

✅ Suitable use cases

Requirements checklist

​When to use self-hosted solutions

❌ Not suitable for platform toolkits

Self-hosting options

​Best practices

1. Async I/O patterns

2. Error handling

3. Connection pooling

4. Memory management

5. Context management

​Migration from individual tools to toolkits

Migration checklist

​Summary

​Langflow Tools

​API & MCP Tool Performance

​API Tools (OpenAPI-based)

​MCP Tools (Model Context Protocol)

​Summary

Overview

Tool Types in wxO

Choosing the Right Tool Type

Tool Type Comparison

Understanding Tool Performance

The Granularity Principle

Performance Characteristics by Tool Type

Python & Langflow Tool Performance

Shared Technical Constraints

Python Tools

Overview

New toolkit runtime architecture

Resource allocation and performance requirements

Python tool design principles

Performance analysis and capacity planning

When to use platform-hosted Python toolkits

When to use self-hosted solutions

Best practices

Migration from individual tools to toolkits

Summary

Langflow Tools

API & MCP Tool Performance

API Tools (OpenAPI-based)

MCP Tools (Model Context Protocol)

Summary