Skip to main content
This guide helps you migrate existing agents from llama-3.2-90B-vision-instruct to gpt-oss-120b. Based on extensive testing across 256 production agents, GPT-OSS-120B delivers superior performance in response time, error handling, conversational quality, and user experience when properly configured.

Why migrate to GPT-OSS-120B

GPT-OSS-120B offers significant advantages over Llama models:
  • Faster response times across all scenarios
  • Superior error handling with intelligent recovery and detailed explanations
  • Enhanced conversational efficiency through single-turn parameter collection
  • Better memory and context retention across multi-turn dialogues
  • Improved safety with robust out-of-scope request handling
  • More natural interactions that feel less robotic

Before you begin

Prerequisites

  • Access to watsonx Orchestrate (SaaS or Developer Edition)
  • Existing agents built with Llama models
  • Familiarity with agent configuration and prompt engineering

Understanding the differences

GPT-OSS-120B behaves differently from Llama models in several key ways:
AspectLlama 3.2 90BGPT-OSS-120B
System promptUses standard watsonx Orchestrate system promptDoes not use standard system prompt
Knowledge preferenceBalanced between internal and external knowledgePrefers internal knowledge unless instructed otherwise
Parameter collectionTurn-by-turn questioningSingle-turn collection of multiple parameters
Error communicationGeneric error messagesContext-aware, detailed explanations
Instruction followingFlexible interpretationLiteral, precise following of examples
Reasoning approachConcise executionMore exploratory with additional tool calls

Migration process

1

Review your agent configuration

Export your existing Llama-based agent to review its configuration:
orchestrate agents export -n <agent-name> -k native -o agent-backup.zip
Document the following elements:
  • Agent instructions and tone
  • Tool usage patterns
  • Knowledge base dependencies
  • Expected user interaction flows
2

Update the LLM configuration

Modify your agent configuration to use GPT-OSS-120B:
llm: groq/openai/gpt-oss-120b
3

Optimize agent instructions

GPT-OSS-120B requires explicit, model-specific instructions. Add the following blocks to your agent’s instructions based on your needs:

Essential instruction template

Use this comprehensive template as a starting point:
Behavior and sources:
- Identify as an agent operating within watsonx Orchestrate.
- Always use available tools to retrieve information before relying on your internal knowledge.
- If tools don't contain the answer, state "The available tools don't contain the answer" and ask one clarifying question.

Reasoning and brevity controls:
- Use concise reasoning with at most 3 reasoning steps before responding.
- Keep chit-chat to one sentence, then proceed.
- Limit final answers to ≤150 words unless the user requests detail.

Error handling:
- If required inputs are missing, ask for the minimum missing fields in a single question.
- When errors occur, explain what went wrong and suggest next steps.

Formatting:
- Use proper Markdown syntax for hyperlinks: [link text](url)
- Format responses clearly with appropriate structure.

Prioritize knowledge bases (if applicable)

If your agent uses knowledge bases, add this instruction:
Always check the connected knowledge base(s) first.
Prefer information retrieved from knowledge over your
own internal knowledge. If relevant content is found,
summarize it faithfully and cite the source title or document
name. If the knowledge base does not contain the answer, say
"I don't know based on the provided knowledge" and ask
a clarifying question.

Optimize tool usage

For agents with multiple tools, provide clear guidance:
Tool usage rules:
- Use the [tool_name] tool for [specific purpose].
- Call tools with all available information; don't ask for parameters you can infer.
- If a tool returns an error, analyze the error message and retry with corrections.
- If a tool fails after 2 attempts, inform the user and suggest alternatives.

Control agent routing (for supervisor agents)

If your agent delegates to other agents, use explicit action verbs:
Agent delegation:
- Call the [agent_name] agent when [specific condition].
- Execute the agent call immediately with available information.
- Do NOT ask for additional parameters before calling the agent unless absolutely required.
4

Remove problematic patterns

GPT-OSS-120B can be constrained by overly specific examples. Review and update: Avoid:
When updating employee data, ask for:
1. Employee name
2. Department
3. Location
Then call the update_employee tool.
✅ Prefer:
When updating employee data, collect all required parameters
(name, department, location) in a single question, then call
the update_employee tool.
Avoid:
If the user asks about weather, say "I cannot help with that."
If they ask about sports, say "That's outside my scope."
If they ask about news, say "I don't have access to that."
✅ Prefer:
If the user asks about topics outside your capabilities,
politely acknowledge the request and redirect to your
primary function.
5

Test and validate

After updating your agent configuration:
  1. Test basic interactions:
    orchestrate chat ask --agent-name <agent-name> "Hello"
    
  2. Test tool calling: Verify that tools are called correctly with appropriate parameters.
  3. Test error scenarios: Ensure error messages are clear and recovery is intelligent.
  4. Test multi-turn conversations: Confirm context retention across multiple exchanges.
  5. Test edge cases: Validate behavior with incomplete information, out-of-scope requests, and ambiguous queries.
6

Deploy and monitor

Import your updated agent:
orchestrate agents import -f updated-agent.yaml
Monitor initial usage for:
  • Response quality and accuracy
  • Tool call precision
  • User satisfaction
  • Error rates and recovery success

Common migration challenges

Challenge 1: Over-reliance on internal knowledge

Symptom: Agent provides answers from its training data instead of using tools or knowledge bases. Solution: Add explicit knowledge prioritization instructions (see Step 3).

Challenge 2: Excessive parameter collection

Symptom: Agent asks for parameters before routing to specialized agents or calling tools. Solution: Use strong negations in instructions:
Do NOT ask for [parameter_name] before calling [tool/agent].
Call immediately with available information.

Challenge 3: Literal example following

Symptom: Agent only handles scenarios exactly as shown in examples. Solution: Remove specific examples and use generic patterns instead.

Challenge 4: Tool call precision issues

Symptom: Agent makes irrelevant tool calls or misses required calls. Solution:
  • Improve tool descriptions with clear use cases
  • Add explicit tool usage rules in instructions
  • Test iteratively and refine based on results

Challenge 5: Agent routing confusion

Symptom: Agent returns JSON instead of executing agent calls. Solution: Change from implicit to explicit instructions:
❌ "Delegate to the appropriate agent"
✅ "Call the [agent_name] agent with the information you have"

Prompt engineering best practices

DO:

  • ✅ Use explicit action verbs (“Call”, “Execute”, “Use”)
  • ✅ Provide strong negations for unwanted behaviors
  • ✅ Keep examples generic and minimal
  • ✅ Trust the model’s reasoning capabilities
  • ✅ Leverage single-turn parameter collection
  • ✅ Test iteratively with real scenarios

DON’T:

  • Rely on implicit instructions (“delegate”, “route”)
  • Provide overly specific examples that constrain behavior
  • List exhaustive value options (agent may limit itself)
  • Use weak negations for critical constraints
  • Force turn-by-turn parameter collection
  • Over-constrain conversational patterns

Performance optimization tips

Reduce response latency

Reasoning efficiency:
- Limit yourself to 3 reasoning steps maximum.
- Do not re-plan unless the last tool result contradicts prior assumptions.
- If you cannot progress after 3 steps, ask one focused question.

Improve conversational flow

Conversation style:
- Collect multiple related parameters in a single question.
- Avoid asking for information you can reasonably infer.
- Keep responses concise and action-oriented.

Enhance error recovery

Error handling strategy:
- When a tool fails, analyze the error message carefully.
- Retry with intelligent corrections based on the error.
- After 2 failed attempts, explain the issue and suggest alternatives.
- Extract useful information from partial failures when possible.

Validation checklist

Before considering your migration complete, verify:
  • Agent uses groq/openai/gpt-oss-120b as the LLM
  • Instructions include model-specific optimizations
  • Knowledge base prioritization is configured (if applicable)
  • Tool usage rules are explicit and clear
  • Agent routing uses explicit action verbs (if applicable)
  • Overly specific examples have been removed
  • Basic interactions work as expected
  • Tool calling is accurate and efficient
  • Error handling is clear and helpful
  • Multi-turn conversations maintain context
  • Edge cases are handled gracefully

Troubleshooting

Add output constraints to instructions:
Target an answer length of 4-6 sentences (or ≤150 words).
Use bullet points only when they increase clarity.
Avoid repeating the prompt or restating obvious context.
Strengthen knowledge prioritization:
CRITICAL: Always search the knowledge base first.
Never rely on internal knowledge when the knowledge base
might contain the answer. If you use internal knowledge
when the knowledge base was available, this is a failure.
Add reasoning constraints:
Tool efficiency:
- Call only the minimum tools needed to answer the question.
- Do not explore alternative approaches unless the first fails.
- Stop after getting a successful result.
Ensure instructions are:
  • Explicit and unambiguous
  • Free from conflicting directives
  • Using strong, clear language
  • Tested with multiple scenarios

Next steps

Additional resources