The policy mode you want to use. loadbalance: Distributes requests between models based on weight values (default is 1 for each). fallback: Uses another model if one becomes unavailable. single: Uses only one model, but allows for --retry-on-code and --retry-attempts.
After you create a model policy, assign the policy to your agent by using the ADK and the agent YAML file. Assign the policy the same way that you assign a regular model. You cannot assign a model policy to a Generative Prompt Activity in a flow.
Where the my_spec.yaml file follows this structure:
Load balancing
Fallback
Single
[my_spec.yaml]
spec_version: v1kind: modelname: anygemdescription: Balances requests between 2 Gemini modelsdisplay_name: Any Gempolicy: strategy: mode: loadbalance on_status_codes: [503, 504] retry: attempts: 1 targets: - model_name: virtual-model/google/gemini-2.0-flash weight: 0.75 # Weights must be greater than 0 and less than or equal to 1 - model_name: virtual-model/google/gemini-2.0-flash-lite weight: 0.25
[my_spec.yaml]
spec_version: v1kind: modelname: firstgemdescription: Use the first Gemini model that doesn't return 503display_name: First Gempolicy: strategy: mode: fallback retry: attempts: 1 on_status_codes: [503] targets: - model_name: virtual-model/google/gemini-2.0-flash - model_name: virtual-model/google/gemini-2.0-flash-lite
[my_spec.yaml]
spec_version: v1 kind: model name: retrygem description: Gemini model that retries up to 3 times on 503 display_name: Retry Gem policy: strategy: mode: single retry: attempts: 3 on_status_codes: [503] targets: - model_name: virtual-model/google/gemini-2.0-flash
Flags:
--file (-f): File path of the spec file containing the model policy configuration.