Managing custom LLMs with the AI gateway

You can integrate third-party LLM models from a variety of supported providers through the AI Gateway system. The gateway also supports the configuration of routing policies, enabling use cases such as load balancing and fallback across multiple models.

Supported providers

Provider	Provider ID
OpenAI	`openai`
watsonx.ai	`watsonx`
Groq	`groq`
Anthropic	`anthropic`
Google	`google`
Azure AI	`azure-ai`
Azure OpenAI	`azure-openai`
AWS Bedrock	`bedrock`
Mistral	`mistral-ai`
OpenRouter	`openrouter`
x.ai	`x-ai`
Ollama	`ollama`

Note: When importing a model from OpenRouter, always set the max_token parameter explicitly. If you skip this step, the system defaults to 65536 tokens. This high token count can cause the request to fail if you don’t have enough credits.

CLI Reference

Importing from a file
Using the CLI only

You can add a model to the watsonx Orchestrate AI gateway using the orchestrate models import command.

Define the model specification file

granite-3-3-8b-model.yaml

spec_version: v1
kind: model
name: virtual-model/watsonx/ibm/granite-3.3-8b-instruct
display_name: IBM watsonx.ai (Granite)
description: |
IBM watsonx.ai model using Space-scoped configuration.
tags:
- ibm
- watsonx
model_type: chat
provider_config:
    watsonx_space_id: my-space-id # For any non-sensitive field not already provided by the connection

Show properties

spec_version

required

The schema version of the file for backwards compatability.This always is v1.

kind

required

The kind of manifest file. For model definitions, this is always model.

name

required

The name of the model to add. This name follows the form:

virtual-model/provider (hosting platform)/name of the model

If virtual-model/ is omitted it is automatically added and required everywhere this model is referenced. For example: an agent’s llm field.

display_name

required

The name of the model as it appears in the UI.

description

The description of the model as it appears in the orchestrate agents list command.

Examples using the supported providers

The following sections contain examples and the supported schemas for each model provider.

OpenAI

provider_config

object

The fields which can either be set by connection or by the provider_config field of the model. Values from a connection will be merged with the provider_config.

Hide properties

api_key

string

required

The API key for the provider

custom_host

string

Send requests to a custom hostname other than the default for the provider

response_headers

list[string]

Add one or more additional response headers in the form ["header:value", "header2:value2"] to the request to the server.

response_timeout

number

The response timeout in seconds

transform_to_form_data

boolean

Transforms the request to form_data.

Example usage:

Define the model specification file

Then, you can define a specification file to provide the details about the model and the provider configuration specifications:

gpt-5-2025-08-07.yaml

spec_version: v1
kind: model
name: openai/gpt-5-2025-08-07
display_name: GPT 5
description: |-
    GPT-5 is our flagship model for coding, reasoning, and agentic tasks across domains. Learn more in our GPT-5 usage guide.
tags:
- openai
- gpt
model_type: chat
provider_config:
    custom_host: https://my-openai-compatible-server

Create an API key connection

To safely use the OpenAI API key, you must first create a connection:

BASH

orchestrate connections add -a openai_credentials
orchestrate connections configure -a openai_credentials --env draft -k key_value -t team
orchestrate connections set-credentials -a openai_credentials --env draft -e "api_key=my_openai_key"

Add the model

You can now add the model using the specification file and the connection that you created:

orchestrate models import --file gpt-5-2025-08-07 --app-id openai_credentials

watsonx.ai

provider_config

object

The fields which can either be set by connection or by the provider_config field of the model. Values from a connection will be merged with the provider_config.

Hide properties

api_key

string

required

The API key for the provider.

custom_host

string

required

The service instance url of the watsonx.ai instance

watsonx_space_id

string

conditionally required

At least one of space/project/deployment is required

watsonx_project_id

string

conditionally required

At least one of space/project/deployment is required

watsonx_deployment_id

string

conditionally required

At least one of space/project/deployment is required

watsonx_cpd_url

string

conditionally required

When connecting to a watsonx.ai instance hosted in CPD, this is the url of the CPD cluster hosting watsonx.ai.Required connecting to on-prem (CPD) hosted wx.ai instances

watsonx_cpd_username

string

conditionally required

When connecting to a watsonx.ai instance hosted in CPD, this is username of a user with access to the CPD cluster.Required connecting to on-prem (CPD) hosted wx.ai instances

watsonx_cpd_password

string

conditionally required

When connecting to a watsonx.ai instance hosted in CPD, this is password of a user with access to the CPD cluster.Required connecting to on-prem (CPD) hosted wx.ai instances

response_timeout

number

The response timeout in seconds

Example usage:

Define the model specification file

watsonx-model.yaml

spec_version: v1
kind: model
name: watsonx/ibm/granite-3.3-8b-instruct
display_name: IBM watsonx.ai (Granite)
description: |
    IBM watsonx.ai model using Space-scoped configuration.
tags:
- ibm
- watsonx
model_type: chat
provider_config:
    watsonx_space_id: my-space-id

Create an API key connection

orchestrate connections add -a watsonx_credentials
orchestrate connections configure -a watsonx_credentials --env draft -k key_value -t team
orchestrate connections set-credentials -a watsonx_credentials --env draft -e "api_key=my_watsonx_api_key"

Add the model

orchestrate models import --file watsonx-model.yaml --app-id watsonx_credentials

Notes:

Provide one of: watsonx_space_id, watsonx_project_id, or watsonx_deployment_id.
Include watsonx_cpd_url, watsonx_cpd_username, watsonx_cpd_password only for on-prem (CPD) setups.
When deploying Deploy on Demand (DoD) models, you need to explicitly provide the model configuration during registration. Set these configuration values according to the model’s requirements, since they don’t automatically transfer during inference from the WXO side.
Show example
YAML
config: max_tokens: 2000 temperature: 0 decoding_method: "sample"

Groq

provider_config

object

The fields which can either be set by connection or by the provider_config field of the model. Values from a connection will be merged with the provider_config.

Hide properties

api_key

string

required

The API key for the provider

custom_host

string

required

Send requests to a custom hostname for the provider

response_headers

list[string]

Add one or more additional response headers in the form ["header:value", "header2:value2"] to the request to the server

response_timeout

number

The response timeout in seconds

Example usage:

Define the model specification file

Then, you can define a specification file to provide the details about the model and the provider configuration specifications:

gpt-oss-120b.yaml

spec_version: v1
kind: model
name: virtual-model/groq/openai/gpt-oss-120b
display_name: openai/gpt-oss-120b # Optional
description: Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.
tags:
  - openai
  - gpt-oss-120b
model_type: chat # Optional. Default is "chat". Options: ["chat"|"chat_vision"|"completion"|"embedding"]
app_id: groq_credentials
provider_config:
    custom_host: https://api.groq.com/openai/v1

Create an API key connection

To safely use the API key, you must first create a connection:

BASH

orchestrate connections add -a groq_credentials
orchestrate connections configure -a groq_credentials --env draft -k key_value -t team
orchestrate connections set-credentials -a groq_credentials --env draft -e "api_key=my_openai_key"

Add the model

You can now add the model using the specification file and the connection that you created:

orchestrate models import --file gpt-oss-120b.yaml --app-id groq_credentials

Anthropic

provider_config

object

The fields which can either be set by connection or by the provider_config field of the model. Values from a connection will be merged with the provider_config.

Hide properties

api_key

string

required

The API key for the provider.

anthropic_beta

string

anthropic_version

string

response_headers

list[string]

Add one or more additional response headers in the form ["header:value", "header2:value2"] to the request to the server.

response_timeout

number

The response timeout in seconds

Example usage:

Define the model specification file

anthropic-claude.yaml

spec_version: v1
kind: model
name: anthropic/claude-3
display_name: Anthropic Claude 3
description: |
    Anthropic Claude model for safe and helpful AI interactions.
tags:
- anthropic
- claude
model_type: chat
provider_config: {}

Create an API key connection

orchestrate connections add -a anthropic_credentials
orchestrate connections configure -a anthropic_credentials --env draft -k key_value -t team
orchestrate connections set-credentials -a anthropic_credentials --env draft -e "api_key=my_anthropic_key"

Add the model

orchestrate models import --file anthropic-claude.yaml --app-id anthropic_credentials

Google

provider_config

object

The fields which can either be set by connection or by the provider_config field of the model. Values from a connection will be merged with the provider_config.

Hide properties

api_key

string

required

The API key for the provider.

response_headers

list[string]

Add one or more additional response headers in the form ["header:value", "header2:value2"] to the request to the server.

response_timeout

number

The response timeout in seconds

Example usage:

Define the model specification file

google-genai.yaml

spec_version: v1
kind: model
name: google/gemini-2.5-pro
display_name: Google Generative AI (Gemini 2.5 Pro)
description: |
    Google Generative AI model via API key authentication.
tags:
- google
- genai
model_type: chat
provider_config: {}

Create an API key connection

orchestrate connections add -a google_credentials
orchestrate connections configure -a google_credentials --env draft -k key_value -t team
orchestrate connections set-credentials -a google_credentials --env draft -e "api_key=my_google_api_key"

Add the model

orchestrate models import --file google-genai.yaml --app-id google_credentials

Azure

provider_config

object

The fields which can either be set by connection or by the provider_config field of the model. Values from a connection will be merged with the provider_config.

Hide properties

api_key

string

required

The API key for the provider

azure_resource_name

string

required

azure_deployment_id

string

required

azure_api_version

string

required

azure_model_name

string

required

custom_host

string

Send requests to a custom hostname other than the default for the provider

response_headers

list[string]

Add one or more additional response headers in the form ["header:value", "header2:value2"] to the request to the server.

response_timeout

number

The response timeout in seconds

Example usage:

Define the model specification file

azure-gpt.yaml

spec_version: v1
kind: model
name: azure/gpt-4
display_name: Azure GPT-4
description: |
    Azure-hosted GPT model for enterprise-grade AI workloads.
tags:
- azure
- gpt
model_type: chat
provider_config:
    azure_resource_name: my-resource
    azure_deployment_id: my-deployment
    azure_api_version: 2024-05-01

Create an API key connection

orchestrate connections add -a azure_credentials
orchestrate connections configure -a azure_credentials --env draft -k key_value -t team
orchestrate connections set-credentials -a azure_credentials --env draft -e "api_key=my_azure_key"

Add the model

orchestrate models import --file azure-gpt.yaml --app-id azure_credentials

Azure OpenAI

provider_config

object

The fields which can either be set by connection or by the provider_config field of the model. Values from a connection will be merged with the provider_config.

Hide properties

api_key

string

required

The API key for the provider

resource_name

string

required

deployment_id

string

required

api_version

string

required

ad_auth

boolean

azure_auth_mode

string

azure_managed_client_id

string

azure_entra_client_id

string

azure_entra_client_secret

string

azure_entra_tenant_id

string

azure_ad_token

string

azure_model_name

string

response_headers

list[string]

Add one or more additional response headers in the form ["header:value", "header2:value2"] to the request to the server.

response_timeout

number

The response timeout in seconds

Example usage:

Define the model specification file

azure-openai-gpt.yaml

spec_version: v1
kind: model
name: azure-openai/gpt-4
display_name: Azure OpenAI GPT-4
description: |
    Azure OpenAI GPT-4 model for enterprise workloads.
tags:
- azure
- openai
model_type: chat
provider_config:
resource_name: my-resource
deployment_id: my-deployment
api_version: 2024-05-01

Create an API key connection

orchestrate connections add -a azure_openai_credentials
orchestrate connections configure -a azure_openai_credentials --env draft -k key_value -t team
orchestrate connections set-credentials -a azure_openai_credentials --env draft -e "api_key=my_azure_openai_key"

Add the model

orchestrate models import --file azure-openai-gpt.yaml --app-id azure_openai_credentials

AWS Bedrock

provider_config

object

The fields which can either be set by connection or by the provider_config field of the model. Values from a connection will be merged with the provider_config.

Hide properties

api_key

string

required

The API key for the provider.Either the api_key must be provided, or both the aws_secret_access_key and aws_access_key_id must be provided

aws_secret_access_key

string

required

The aws_secret_access_key.Either the api_key must be provided, or both the aws_secret_access_key and aws_access_key_id must be provided

aws_access_key_id

string

required

The aws_access_key_id.Either the api_key must be provided, or both the aws_secret_access_key and aws_access_key_id must be provided

aws_session_token

string

aws_region

string

aws_auth_type

string

aws_role_arn

string

aws_external_id

string

aws_s3_bucket

string

aws_s3_object_key

string

aws_bedrock_model

string

aws_server_side_encryption

string

aws_server_side_encryption_kms_key_id

string

response_headers

list[string]

Add one or more additional response headers in the form ["header:value", "header2:value2"] to the request to the server.

response_timeout

number

The response timeout in seconds

Example usage:

Define the model specification file

aws-bedrock-model.yaml

spec_version: v1
kind: model
name: bedrock/us.anthropic.claude-3-5-sonnet-20241022-v2:0
display_name: AWS Bedrock Claude
description: |
    AWS Bedrock integration for foundation models like Claude.
tags:
- aws
- bedrock
model_type: chat
provider_config:
    aws_region: us-east-1

Create an API key connection

orchestrate connections add -a aws_bedrock_credentials
orchestrate connections configure -a aws_bedrock_credentials --env draft -k key_value -t team
orchestrate connections set-credentials -a aws_bedrock_credentials --env draft -e "api_key=my_aws_key"

Add the model

orchestrate models import --file aws-bedrock-model.yaml --app-id aws_bedrock_credentials

Note:

You must provide either the api_key, aws_secret_access_key, or aws_access_key_id.
You must provide the model name in the name field.
When deploying Deploy on Demand (DoD) models, you need to explicitly provide the model configuration during registration. Set these configuration values according to the model’s requirements, since they don’t automatically transfer during inference from the watsonx Orchestrate side.
Show example
YAML
config: max_tokens: 2000 temperature: 0 decoding_method: "sample"

Mistral

provider_config

object

The fields which can either be set by connection or by the provider_config field of the model. Values from a connection will be merged with the provider_config.

Hide properties

api_key

string

required

The API key for the provider.

mistral_fim_completion

boolean

response_headers

list[string]

Add one or more additional response headers in the form ["header:value", "header2:value2"] to the request to the server.

response_timeout

number

The response timeout in seconds

Example usage:

Define the model specification file

mistral-large.yaml

spec_version: v1
kind: model
name: mistralai/mistral-7b-instruct-v0.3
display_name: Mistral 7B Instruct v0.3
description: |
    Mistral model for general-purpose reasoning and coding tasks.
tags:
- mistral
model_type: chat
provider_config:
    mistral_fim_completion: false

Create an API key connection

orchestrate connections add -a mistral_credentials
orchestrate connections configure -a mistral_credentials --env draft -k key_value -t team
orchestrate connections set-credentials -a mistral_credentials --env draft -e "api_key=my_mistral_api_key"

Add the model

orchestrate models import --file mistral-large.yaml --app-id mistral_credentials

OpenRouter

provider_config

object

The fields which can either be set by connection or by the provider_config field of the model. Values from a connection will be merged with the provider_config.

Hide properties

api_key

string

required

The API key for the provider.

response_headers

list[string]

Add one or more additional response headers in the form ["header:value", "header2:value2"] to the request to the server.

response_timeout

number

The response timeout in seconds

Example usage:

Create an API key connection

orchestrate connections add -a openrouter_credentials
orchestrate connections configure -a openrouter_credentials --env draft -k key_value -t team
orchestrate connections set-credentials -a openrouter_credentials --env draft -e "api_key=my_openrouter_api_key"

Define the model specification file

openrouter-model.yaml

spec_version: v1
kind: model
name: openrouter/openai/gpt-5
display_name: OpenRouter GPT-5 Chat
description: |
    OpenRouter model for routing requests across multiple LLM providers.
tags:
- openrouter
- gpt
model_type: chat
provider_config: {}

Add the model

orchestrate models import --file openrouter-model.yaml --app-id openrouter_credentials

x.ai

provider_config

object

The fields which can either be set by connection or by the provider_config field of the model. Values from a connection will be merged with the provider_config.

Hide properties

api_key

string

required

The API key for the provider.

response_headers

list[string]

Add one or more additional response headers in the form ["header:value", "header2:value2"] to the request to the server.

response_timeout

number

The response timeout in seconds

Example usage:

Create an API key connection

orchestrate connections add -a xai_credentials
orchestrate connections configure -a xai_credentials --env draft -k key_value -t team
orchestrate connections set-credentials -a xai_credentials --env draft -e "api_key=xai_api_key"

Define the model specification file

xai-model.yaml

spec_version: v1
kind: model
name: virtual-model/x-ai/grok
display_name: Grok
description: |
    x.ai model
tags:
- x.ai
- gpt
model_type: chat
provider_config: {}

Add the model

orchestrate models import --file xai-model.yaml --app-id xai_credentials

Ollama

provider_config

object

The fields which can either be set by connection or by the provider_config field of the model. Values from a connection will be merged with the provider_config.

Hide properties

api_key

string

required

The API key for the provider

custom_host

string

required

Send requests to a custom hostname other than the default for the provider

url_to_fetch

string

conditionally required

The Ollama url to fetch the list of available ollama models

response_headers

list[string]

Add one or more additional response headers in the form ["header:value", "header2:value2"] to the request to the server.

response_timeout

number

The response timeout in seconds

transform_to_form_data

boolean

Transforms the request to form_data.

Example usage:

Start ollama

In some systems, ollama might run under the systemctl, so you need to stop it before you run the Ollama server:

systemctl stop ollama

Then you can start the Ollama server and download the model, if it has not started yet:

ollama pull llama3.2:latest
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

Get your IP address

You can get your network IP address by running:

ipconfig # get the IPv4 address

Testing your connection

Before you import the model, it is a good idea to test your connection to guarantee the watsonx Orchestrate Developer Edition server can connect to the Ollama server.

Use the following curl command to test your connection, replacing 198.51.100.42 with the IP address that you obtained in the previous step:

curl --request POST \
--url http://198.51.100.42:11434/v1/chat/completions \
--header 'content-type: application/json' \
--data '{
"model": "llama3.2:latest",
"messages": [
{
"content": "Hi",
"role": "user"
}
]
}'

Enter the watsonx Orchestrate Developer Edition gateway container:

docker exec -it docker-wxo-agent-gateway-1 sh

Run the curl command again from within the container shell.

Tips: If you experience connection issues with Ollama:

Wait a few minutes after starting the server before running the command.
Restart the Ollama server.
Close any VPN clients.
Try reconnecting to both Wi-Fi and wired Ethernet simultaneously.
Avoid switching networks during the process.
Reset the watsonx Orchestrate Developer Edition server:

orchestrate server reset

Define the model specification file

For Ollama, you don’t need to create a connection or use an actual API key. You can use a string such as ollama as an API key.You must use your current local network IP address as your URL. Ollama will not work if you use localhost or 0.0.0.0 in the model specification file.

ollama-llama2.yaml

spec_version: v1
kind: model
name: ollama/llama3.2:latest
display_name: Ollama LLaMA 3.2
description: |
    Ollama-hosted LLaMA 3.2 model for local or edge deployments.
tags:
- ollama
- llama2
model_type: chat
provider_config:
    api_key: ollama
    custom_host: http://198.51.100.42:11434

Remember: Replace http://198.51.100.42:11434 with the IP address that you have obtained in the previous step.

Add the model

orchestrate models import --file ollama-llama2.yaml

List all LLMs

Run the orchestrate models list command to see all available LLMs in your active environment.

BASH

orchestrate models list

Note:By default, you see a table of available models. If you prefer raw output, add the --raw (-r) flag.

Removing custom LLMs

Run the orchestrate models remove command and use the --name (-n) flag to specify the LLM you want to remove.

BASH

orchestrate models remove -n <model-name-unique-identifier-to-delete>

Updating custom LLM

To update a custom LLM, first remove it, then add it again:

BASH

orchestrate models remove -n <model-name-unique-identifier-to-delete>
orchestrate models add --name watsonx/meta-llama/llama-3-2-90b-vision-instruct --app-id watsonx_ai_creds

Additional configuration options

Setting a default LLM in the UI

For on-premises installations which have models only provisioned through the AI gateway, it’s possible to select which of these models should be marked as the default model by using the following default tag:

granite-default-model.yaml

spec_version: v1
kind: model
name: watsonx/ibm/granite-3.3-8b-instruct
display_name: IBM watsonx.ai (Granite)
description: |
    IBM watsonx.ai model using Space-scoped configuration.
tags:
- default # <-- this marks this as the Default model in the ui dropdown
model_type: chat
provider_config:
    watsonx_space_id: my-space-id

Note:For on-premises installations using only externally hosted virtual-models, at least one model must be specified as the default model or it will not be possible to open the “Create Agent” page in the UI.

Registering a watsonx model by using your watsonx credentials

You can also register a watsonx model that uses your watsonx credentials supplied in your .env file when you start the watsonx Orchestrate Developer Edition. For that, your .env file must contain either:

Your watsonx.ai credentials with the WATSONX_APIKEY and WATSONX_SPACE_ID environment variables.
Or, your watsonx Orchestrate credentials with the WO_INSTANCE and WO_API_KEY environment variables.

To learn how to configure you .env file with these credentials, see Installing the watsonx Orchestrate Developer Edition. To register the watsonx model through this method, you must create an api_key credential with the value “gateway”. You also don’t need to specify a space_id when you add the model. See the following example:

BASH

orchestrate connections configure -a wx_gw_creds --env draft -k key_value -t team
orchestrate connections set-credentials -a wx_gw_creds --env draft -e "api_key=gateway"
orchestrate models add --name "watsonx/meta-llama/llama-3-2-90b-vision-instruct"  --app-id wx_gw_creds

Release Notes

Get Started

Build

Analyze

watsonx Orchestrate Developer Edition

watsonx Orchestrate MCP Server

Reference

Legal notices

Supported providers

CLI Reference

Examples using the supported providers

List all LLMs

Removing custom LLMs

Updating custom LLM

Additional configuration options

Setting a default LLM in the UI

Registering a watsonx model by using your watsonx credentials

Release Notes

Get Started

Build

Analyze

watsonx Orchestrate Developer Edition

watsonx Orchestrate MCP Server

Reference

Legal notices

​Supported providers

​CLI Reference

​Examples using the supported providers

​List all LLMs

​Removing custom LLMs

​Updating custom LLM

​Additional configuration options

​Setting a default LLM in the UI

​Registering a watsonx model by using your watsonx credentials

Supported providers

CLI Reference

Examples using the supported providers

List all LLMs

Removing custom LLMs

Updating custom LLM

Additional configuration options

Setting a default LLM in the UI

Registering a watsonx model by using your watsonx credentials