You can use third party models from a wide range of supported providers using the AI gateway system. It also allows for policies to be established that handles routing between mutliple models for handling use cases such as load-balancing and fallback.

Supported providers

ProviderProvider ID
OpenAIopenai
Anthropicanthropic
Googlegoogle
watsonx.aiwatsonx
Mistralmistral
OpenRouteropenrouter
Ollamaollama

Configuring custom LLMs

Provider configuration

To add a custom LLM, you must provide a JSON string with the provider configuration, like the following example:

[JSON]
{
  "custom_host": "https://example.com/v1/api",
  "request_timeout": 500,
}

Each provider supports different JSON string schemas. The following sections detail the supported schema for each provider.

Most providers require an api_key value to authenticate with the LLM service. Although you can use this value in the JSON string configuration, the safest way to store secret values, such as API keys, is to use a connection:

orchestrate connections add -a my_creds
orchestrate connections configure -a my_creds --env draft -k key_value -t team
orchestrate connections set-credentials -a my_creds --env draft -e "api_key=my_api_key"

The following sections include the supported values for each provider in the provider configuration.

OpenAI

  • api_key (Required)
  • custom_host
  • url_to_fetch
  • forward_headers
  • request_timeout
  • transform_to_form_data

Anthropic

  • api_key (Required)
  • anthropic_beta
  • anthropic_version
  • custom_host
  • url_to_fetch
  • forward_headers
  • request_timeout
  • transform_to_form_data

Google

  • api_key (Required)
  • custom_host
  • url_to_fetch
  • forward_headers
  • request_timeout
  • transform_to_form_data

watsonx.ai

  • api_key (Required)
  • You must provide either your Space ID, Project ID or Deployment ID. You don’t need all three:
    • watsonx_space_id (Required)
    • watsonx_project_id (Required)
    • watsonx_deployment_id (Required)
  • watsonx_cpd_url (Required in on-premises environments)
  • watsonx_cpd_username (Required in on-premises environments)
  • watsonx_cpd_password (Required in on-premises environments)
  • watsonx_version
  • custom_host
  • url_to_fetch
  • forward_headers
  • request_timeout
  • transform_to_form_data

Mistral

  • api_key (Required)
  • mistral_fim_completion
  • custom_host
  • url_to_fetch
  • forward_headers
  • request_timeout
  • transform_to_form_data

OpenRouter

  • api_key (Required)
  • custom_host
  • url_to_fetch
  • forward_headers
  • request_timeout
  • transform_to_form_data

Ollama

  • api_key
  • custom_host
  • url_to_fetch
  • forward_headers
  • request_timeout
  • transform_to_form_data

Adding custom LLM

Run the orchestrate models add command to add a custom LLM to your active environment.

[BASH]
orchestrate models add --name watsonx/meta-llama/llama-3-2-90b-vision-instruct --app-id watsonx_ai_creds

Arguments:

  • --name (-n): The name of the model you want to add. This name must follow the pattern <provider>/<model_name>. The provider must be exactly as outlined in the Supported providers section. And the model_name must be exactly the same as the name that appears on the provider’s API documentaion.
  • --description (-d): An optional description to appear alongside the model in the list view.
  • --display-name: An Optional display name for the model in the UI.
  • --provider-config: A JSON string of configuration options. These can also be provided via the connection referenced in --app-id, especially secret values. You can use the --provider-config alongside an --app-id to provide non-required values.
  • --type - The type of model that is being created. These are the supported types:
    • chat: Model that supports chat capabilities.
    • chat_vision: Model that supports chat and image capabilities.
    • completion: Model used for completion engines.
    • embedding: Embedding model used for transforming data.
  • --app-id (-a): The app ID of a key_value connection containing provider configuration details. These will be merged with the values provided in --provider-config.

Importing models

If you want more control over your models and the ability to version control the model configuration. Consider using the orchestrate models import command

orchestrate models import --file path_to_my_spec --app-id watsonx_ai_creds
[YAML]
spec_version: v1
kind: model
name: watsonx/meta-llama/llama-3-2-90b-vision-instruct
display_name: Llama 3.2 Vision Instruct #Optional
description: Meta's Llama 3.2 Vision Instruct with 90b parameters running on WatsonX AI #Optional
tags: #Optional
  - meta
  - llama
model_type: chat #Optional Default to "chat"
provider_config:
  watsonx_space_id: my_wxai_space_id

Arguments:

  • --file (-f): File path of the spec file containing the model configuration
  • --app-id (-a): The app id of a key_value connection containing provider configuration details. These will be merged with the values provided in the provider_config section of the spec.

List all LLMs

Run the orchestrate models list command to see all available LLMs in your active environment.

[BASH]
orchestrate models list

Note: By default, you’ll see a table of available models. If you prefer raw output, add the --raw (-r) argument.

Removing custom LLMs

Run the orchestrate models remove command and use the --name (-n) argument to specify the LLM you want to remove.

[BASH]
orchestrate models remove -n <model-name-unique-identifier-to-delete>

Updating custom LLM

To update a custom LLM, first remove it, then add it again. For more information, see Removing custom LLMs and Adding custom LLM.

Configuring model policies

Model policies allow for the coordination of multiple models to accomplish tasks like load-balancing and fallback.

Adding model policies

orchestrate models policy add --name <model_name> --model <provider1>/<model_id1> --model <provider2>/<model_id2> --strategy <strategy_type> --strategy-on-code 500 --retry-on-code 503 --retry-attempts 3

Arguments:

  • --name (-n): The name of the policy you want to add.
  • --description (-d): An optional description to appear a long side the policy in the list view.
  • --display-name: An optional display name for the policy in the UI
  • --strategy (-s): The policy mode you want to use.
    • loadbalance: These models operate together by distributing the load of requests between them, following the distribution of weight values. By default, both weight values are attributed as 1, so the loads are evenly balanced between the models. If you want to customize the weight values, see Importing model policies.
    • fallback: If one of the models is unavailable, the agent will try to use the other one as a fallback alternative.
    • single: Uses a only one model, but allows for --retry-on-code and --retry-attempts.
  • --strategy-on-code: A list of HTTP error codes which triggers the strategy. Used for fallback strategy.
  • --retry-on-code: A list of HTTP error codes for which the model should retry the request.
  • --retry-attempts: How many attempts it should make before stopping.

Importing model policies

orchestrate models policy import --file my_spec.yaml

Where the my_spec.yaml file follows this structure:

my_spec.yaml
spec_version: v1
kind: model
name: anygem
description: Balances requests between 2 Gemini models
display_name: Any Gem
policy:
  strategy:
    mode: loadbalance
  retry:
    attempts: 1
    on_status_codes: [503]
  targets:
    - model_name: virtual-model/google/gemini-2.0-flash
      weight: 0.75   # Weights must be greater than 0 and less than or equal to 1  
    - model_name: virtual-model/google/gemini-2.0-flash-lite
      weight: 0.25

Arguments:

  • --file (-f): File path of the spec file containing the model policy configuration.

Update model policy

Use either the add or import commands with the name of the model policy that you want to update to update the model policy.

Removing model policies

orchestrate models policy remove -n <name of policy>

Arguments:

  • --name (-n): The name of the model policy that you want to remove.