Create Voice Configuration

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

name

string

required

Required string length: 1 - 128

speech_to_text

SpeechToTextConfig · object

required

Show child attributes

speech_to_text.provider

string

required

Required string length: 1 - 128

speech_to_text.watson_stt_config

WatsonSTTConfig · object

Show child attributes

speech_to_text.watson_stt_config.api_url

string

required

Required string length: 1 - 2048

speech_to_text.watson_stt_config.model

string

required

Required string length: 1 - 256

speech_to_text.watson_stt_config.api_key

string | null

Required string length: 1 - 2048

speech_to_text.watson_stt_config.bearer_token

string | null

Required string length: 1 - 2048

speech_to_text.watson_stt_config.end_of_phrase_silence_time

number | null

speech_to_text.watson_stt_config.background_audio_suppression

number | null

Background audio suppression level (0.0 to 1.0). Default 0.0

Required range: 0 <= x <= 1

speech_to_text.watson_stt_config.language_customization_id

string | null

Language customization ID

Required string length: 1 - 256

speech_to_text.watson_stt_config.inactivity_timeout

integer | null

Seconds of inactivity before the service stops listening. Default 30

Required range: x >= -1

speech_to_text.watson_stt_config.profanity_filter

boolean | null

Filter profanity in the transcript. Default true

speech_to_text.watson_stt_config.smart_formatting

boolean | null

Enable smart formatting (beta). Default false

speech_to_text.watson_stt_config.speaker_labels

boolean | null

Enable speaker labels (beta). Default false

speech_to_text.watson_stt_config.redaction

boolean | null

Enable PII redaction (beta). Default false

speech_to_text.watson_stt_config.low_latency

boolean | null

Enable low latency mode. Default false

speech_to_text.watson_stt_config.learning_opt_out

boolean | null

Opt out of data collection for learning. Default true

speech_to_text.watson_stt_config.watson_metadata

string | null

Value for x-watson-metadata header.

Required string length: 1 - 512

speech_to_text.watson_stt_config.smart_formatting_version

integer | null

Version of smart formatting to use.

Required range: x >= 0

speech_to_text.watson_stt_config.customization_weight

number | null

Weight for custom language model (0.0 to 1.0). Default 0.5

Required range: 0 <= x <= 1

speech_to_text.watson_stt_config.character_insertion_bias

number | null

Bias for character insertion (-1.0 to 1.0). Default 0.0

Required range: -1 <= x <= 1

speech_to_text.emotech_stt_config

EmotechSTTConfig · object

Show child attributes

speech_to_text.emotech_stt_config.api_url

string

required

Required string length: 1 - 2048

speech_to_text.emotech_stt_config.api_key

string | null

Required string length: 1 - 2048

speech_to_text.emotech_stt_config.positive_speech_threshold

number | null

default:0.25

Confidence threshold above which audio is classified as speech, default is 0.25

speech_to_text.emotech_stt_config.negative_speech_threshold

number | null

default:0.25

Confidence threshold below which audio is classified as non-speech, default is 0.25

speech_to_text.emotech_stt_config.partial_interval

integer | null

default:500

Time interval (in ms) between partial transcription results, default is 500 ms.

speech_to_text.emotech_stt_config.silence_threshold

integer | null

default:500

Silence duration (in ms) after speech used to determine end of utterance, default is 1500 ms.

text_to_speech

TextToSpeechConfig · object

required

Show child attributes

text_to_speech.provider

string

required

Required string length: 1 - 128

text_to_speech.watson_tts_config

WatsonTTSConfig · object

Show child attributes

text_to_speech.watson_tts_config.api_url

string

required

Required string length: 1 - 2048

text_to_speech.watson_tts_config.voice

string

required

Required string length: 1 - 128

text_to_speech.watson_tts_config.api_key

string | null

Required string length: 1 - 2048

text_to_speech.watson_tts_config.bearer_token

string | null

Required string length: 1 - 2048

text_to_speech.watson_tts_config.rate_percentage

integer | null

default:0

Rate percentage for speech synthesis, default is 0

text_to_speech.watson_tts_config.pitch_percentage

integer | null

default:0

Pitch percentage for speech synthesis, default is 0

text_to_speech.watson_tts_config.language

string | null

Language code for the voice, e.g., 'en-US'

Required string length: 2 - 16

text_to_speech.watson_tts_config.customization_id

string | null

Custom ID for the Watson TTS service

Required string length: 1 - 256

text_to_speech.watson_tts_config.meta_id

string | null

Meta ID for the Watson TTS service

Required string length: 1 - 256

text_to_speech.watson_tts_config.learning_opt_out

boolean | null

Set to true to opt out of data collection for learning purposes

text_to_speech.emotech_tts_config

EmotechTTSConfig · object

Show child attributes

text_to_speech.emotech_tts_config.api_url

string

required

Required string length: 1 - 2048

text_to_speech.emotech_tts_config.api_key

string | null

Required string length: 1 - 2048

text_to_speech.emotech_tts_config.voice

string | null

Required string length: 1 - 128

text_to_speech.elevenlabs_tts_config

ElevenLabsTTSConfig · object

Show child attributes

text_to_speech.elevenlabs_tts_config.model_id

string

required

The ID of the ElevenLabs model to use

Required string length: 1 - 128

text_to_speech.elevenlabs_tts_config.voice_id

string

required

The ID of the ElevenLabs voice to use

Required string length: 1 - 128

text_to_speech.elevenlabs_tts_config.api_key

string | null

required

ElevenLabs API key

Required string length: 1 - 2048

text_to_speech.elevenlabs_tts_config.apply_text_normalization

string | null

Whether to apply text normalization

text_to_speech.elevenlabs_tts_config.language_code

string | null

Language code for the voice, e.g., 'en', 'es'

Required string length: 2 - 16

text_to_speech.elevenlabs_tts_config.optimize_streaming_latency

integer | null

Optimize streaming latency (0-4)

text_to_speech.elevenlabs_tts_config.apply_language_text_normalization

boolean | null

Whether to apply language-specific text normalization

text_to_speech.elevenlabs_tts_config.pronunciation_dictionary_locators

ElevenLabsPronounciationDict · object[] | null

List of pronunciation dictionary locators

Show child attributes

text_to_speech.elevenlabs_tts_config.pronunciation_dictionary_locators.pronunciation_dictionary_id

string

required

ID of the pronunciation dictionary

text_to_speech.elevenlabs_tts_config.pronunciation_dictionary_locators.version_id

string

required

Version ID of the pronunciation dictionary

text_to_speech.elevenlabs_tts_config.seed

integer | null

Seed for deterministic audio generation

text_to_speech.elevenlabs_tts_config.previous_text

string | null

Previous text for context

text_to_speech.elevenlabs_tts_config.next_text

string | null

Next text for context

text_to_speech.elevenlabs_tts_config.voice_settings

ElevenLabsVoiceSettings · object

Voice settings for the ElevenLabs TTS

Show child attributes

text_to_speech.elevenlabs_tts_config.voice_settings.speed

number

default:1

Speech speed. Defaults to 1.0

text_to_speech.elevenlabs_tts_config.voice_settings.style

number

default:0

Style exaggeration: the higher the value, the more computational resources are used. Defaults to 0.0

text_to_speech.elevenlabs_tts_config.voice_settings.stability

number

default:0.5

Stability: how stable the voice is and the randomness between each generation. Defaults to 0.5

text_to_speech.elevenlabs_tts_config.voice_settings.similarity_boost

number

default:0.75

Similarity boost: how closely the AI should adhere to the original voice. Defaults to 0.75

text_to_speech.elevenlabs_tts_config.voice_settings.use_speaker_boost

boolean

default:true

Whether to use speaker boost. Defaults to true

llm_aggregation_timeout_seconds

number | null

default:0.8

Maximum time to wait for additional transcription content before pushing aggregated result.

language

string

default:en-us

Default language code, e.g., 'en-us'

Required string length: 2 - 16

additional_languages

Additional Languages · object

Additional language configurations keyed by language code

Show child attributes

additional_languages.{key}

LanguageVoiceConfig · object

Voice configuration for a specific language

Show child attributes

additional_languages.{key}.text_to_speech

TextToSpeechConfig · object

Show child attributes

additional_languages.{key}.text_to_speech.provider

string

required

Required string length: 1 - 128

additional_languages.{key}.text_to_speech.watson_tts_config

WatsonTTSConfig · object

Show child attributes

additional_languages.{key}.text_to_speech.watson_tts_config.api_url

string

required

Required string length: 1 - 2048

additional_languages.{key}.text_to_speech.watson_tts_config.voice

string

required

Required string length: 1 - 128

additional_languages.{key}.text_to_speech.watson_tts_config.api_key

string | null

Required string length: 1 - 2048

additional_languages.{key}.text_to_speech.watson_tts_config.bearer_token

string | null

Required string length: 1 - 2048

additional_languages.{key}.text_to_speech.watson_tts_config.rate_percentage

integer | null

default:0

Rate percentage for speech synthesis, default is 0

additional_languages.{key}.text_to_speech.watson_tts_config.pitch_percentage

integer | null

default:0

Pitch percentage for speech synthesis, default is 0

additional_languages.{key}.text_to_speech.watson_tts_config.language

string | null

Language code for the voice, e.g., 'en-US'

Required string length: 2 - 16

additional_languages.{key}.text_to_speech.watson_tts_config.customization_id

string | null

Custom ID for the Watson TTS service

Required string length: 1 - 256

additional_languages.{key}.text_to_speech.watson_tts_config.meta_id

string | null

Meta ID for the Watson TTS service

Required string length: 1 - 256

additional_languages.{key}.text_to_speech.watson_tts_config.learning_opt_out

boolean | null

Set to true to opt out of data collection for learning purposes

additional_languages.{key}.text_to_speech.emotech_tts_config

EmotechTTSConfig · object

Show child attributes

additional_languages.{key}.text_to_speech.emotech_tts_config.api_url

string

required

Required string length: 1 - 2048

additional_languages.{key}.text_to_speech.emotech_tts_config.api_key

string | null

Required string length: 1 - 2048

additional_languages.{key}.text_to_speech.emotech_tts_config.voice

string | null

Required string length: 1 - 128

additional_languages.{key}.text_to_speech.elevenlabs_tts_config

ElevenLabsTTSConfig · object

Show child attributes

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.model_id

string

required

The ID of the ElevenLabs model to use

Required string length: 1 - 128

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.voice_id

string

required

The ID of the ElevenLabs voice to use

Required string length: 1 - 128

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.api_key

string | null

required

ElevenLabs API key

Required string length: 1 - 2048

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.apply_text_normalization

string | null

Whether to apply text normalization

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.language_code

string | null

Language code for the voice, e.g., 'en', 'es'

Required string length: 2 - 16

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.optimize_streaming_latency

integer | null

Optimize streaming latency (0-4)

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.apply_language_text_normalization

boolean | null

Whether to apply language-specific text normalization

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.pronunciation_dictionary_locators

ElevenLabsPronounciationDict · object[] | null

List of pronunciation dictionary locators

Show child attributes

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.pronunciation_dictionary_locators.pronunciation_dictionary_id

string

required

ID of the pronunciation dictionary

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.pronunciation_dictionary_locators.version_id

string

required

Version ID of the pronunciation dictionary

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.seed

integer | null

Seed for deterministic audio generation

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.previous_text

string | null

Previous text for context

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.next_text

string | null

Next text for context

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.voice_settings

ElevenLabsVoiceSettings · object

Voice settings for the ElevenLabs TTS

Show child attributes

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.voice_settings.speed

number

default:1

Speech speed. Defaults to 1.0

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.voice_settings.style

number

default:0

Style exaggeration: the higher the value, the more computational resources are used. Defaults to 0.0

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.voice_settings.stability

number

default:0.5

Stability: how stable the voice is and the randomness between each generation. Defaults to 0.5

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.voice_settings.similarity_boost

number

default:0.75

Similarity boost: how closely the AI should adhere to the original voice. Defaults to 0.75

additional_languages.{key}.text_to_speech.elevenlabs_tts_config.voice_settings.use_speaker_boost

boolean

default:true

Whether to use speaker boost. Defaults to true

additional_languages.{key}.speech_to_text

SpeechToTextConfig · object

Show child attributes

additional_languages.{key}.speech_to_text.provider

string

required

Required string length: 1 - 128

additional_languages.{key}.speech_to_text.watson_stt_config

WatsonSTTConfig · object

Show child attributes

additional_languages.{key}.speech_to_text.watson_stt_config.api_url

string

required

Required string length: 1 - 2048

additional_languages.{key}.speech_to_text.watson_stt_config.model

string

required

Required string length: 1 - 256

additional_languages.{key}.speech_to_text.watson_stt_config.api_key

string | null

Required string length: 1 - 2048

additional_languages.{key}.speech_to_text.watson_stt_config.bearer_token

string | null

Required string length: 1 - 2048

additional_languages.{key}.speech_to_text.watson_stt_config.end_of_phrase_silence_time

number | null

additional_languages.{key}.speech_to_text.watson_stt_config.background_audio_suppression

number | null

Background audio suppression level (0.0 to 1.0). Default 0.0

Required range: 0 <= x <= 1

additional_languages.{key}.speech_to_text.watson_stt_config.language_customization_id

string | null

Language customization ID

Required string length: 1 - 256

additional_languages.{key}.speech_to_text.watson_stt_config.inactivity_timeout

integer | null

Seconds of inactivity before the service stops listening. Default 30

Required range: x >= -1

additional_languages.{key}.speech_to_text.watson_stt_config.profanity_filter

boolean | null

Filter profanity in the transcript. Default true

additional_languages.{key}.speech_to_text.watson_stt_config.smart_formatting

boolean | null

Enable smart formatting (beta). Default false

additional_languages.{key}.speech_to_text.watson_stt_config.speaker_labels

boolean | null

Enable speaker labels (beta). Default false

additional_languages.{key}.speech_to_text.watson_stt_config.redaction

boolean | null

Enable PII redaction (beta). Default false

additional_languages.{key}.speech_to_text.watson_stt_config.low_latency

boolean | null

Enable low latency mode. Default false

additional_languages.{key}.speech_to_text.watson_stt_config.learning_opt_out

boolean | null

Opt out of data collection for learning. Default true

additional_languages.{key}.speech_to_text.watson_stt_config.watson_metadata

string | null

Value for x-watson-metadata header.

Required string length: 1 - 512

additional_languages.{key}.speech_to_text.watson_stt_config.smart_formatting_version

integer | null

Version of smart formatting to use.

Required range: x >= 0

additional_languages.{key}.speech_to_text.watson_stt_config.customization_weight

number | null

Weight for custom language model (0.0 to 1.0). Default 0.5

Required range: 0 <= x <= 1

additional_languages.{key}.speech_to_text.watson_stt_config.character_insertion_bias

number | null

Bias for character insertion (-1.0 to 1.0). Default 0.0

Required range: -1 <= x <= 1

additional_languages.{key}.speech_to_text.emotech_stt_config

EmotechSTTConfig · object

Show child attributes

additional_languages.{key}.speech_to_text.emotech_stt_config.api_url

string

required

Required string length: 1 - 2048

additional_languages.{key}.speech_to_text.emotech_stt_config.api_key

string | null

Required string length: 1 - 2048

additional_languages.{key}.speech_to_text.emotech_stt_config.positive_speech_threshold

number | null

default:0.25

Confidence threshold above which audio is classified as speech, default is 0.25

additional_languages.{key}.speech_to_text.emotech_stt_config.negative_speech_threshold

number | null

default:0.25

Confidence threshold below which audio is classified as non-speech, default is 0.25

additional_languages.{key}.speech_to_text.emotech_stt_config.partial_interval

integer | null

default:500

Time interval (in ms) between partial transcription results, default is 500 ms.

additional_languages.{key}.speech_to_text.emotech_stt_config.silence_threshold

integer | null

default:500

Silence duration (in ms) after speech used to determine end of utterance, default is 1500 ms.

dtmf_input

DTMFInputConfig · object

Show child attributes

dtmf_input.inter_digit_timeout_ms

integer | null

default:2500

The amount of time (ms) to wait for a new DTMF digit, default is 2500 ms.

dtmf_input.termination_key

string | null

The DTMF termination key that signals the end of DTMF input.

dtmf_input.maximum_count

integer | null

Maximum number of digits a user can enter.

dtmf_input.ignore_speech

boolean | null

default:true

Disable speech recognition during collection of DTMF digits, default is true.

vad

VADConfig · object

Show child attributes

vad.enabled

boolean | null

default:true

Enable Voice Activity Detection, default is true.

vad.provider

string | null

default:silero_vad

Required string length: 1 - 128

vad.silero_vad_config

SileroVADConfig · object

Show child attributes

vad.silero_vad_config.confidence

number | null

default:0.7

The confidence threshold for speech detection (between 0.0 and 1.0), default is 0.7

vad.silero_vad_config.start_seconds

number | null

default:0.2

The time in seconds speech must be detected before transitioning to SPEAKING state, default is 0.2

vad.silero_vad_config.stop_seconds

number | null

default:0.8

The time in seconds silence must be detected before transitioning to QUIET state, default is 0.8

vad.silero_vad_config.min_volume

number | null

default:0.6

The minimum audio volume threshold for speech detection (between 0.0 and 1.0), default is 0.6

Response

Successful Response

Release Notes

Get Started

Build

Analyze

watsonx Orchestrate Developer Edition

watsonx Orchestrate MCP Server

Reference

Legal notices

Authorizations

Body

Response