With the ADK, you can create a knowlege bases for your agents, either by connecting to your own ElasticSearch or Milvus instance, or by uploading your documents.

Use YAML, JSON or Python files to create your knowledge bases for watsonx Orchestrate.

Creating built-in Milvus knowledge bases

If you don’t have an existing Milvus or Elasticsearch instance to connect to, you can create a knowledge base by simply uploading your documents. These documents will be ingested into the built-in Milvus instance, which will serve as the backend for your knowledge base.

Once the knowledge base is created, you can check its status to see when it’s ready for use.

spec_version: v1
kind: knowledge_base 
name: knowledge_base_name
description: >
   A description of what information this knowledge base addresses
documents:
   - "/file-path-1.pdf",
   - "relative-path/file-path-2.pdf"

Creating external knowledge bases

External knowledge bases allow you to connect your existing Milvus or Elasticsearch databases as a knowledge source for your agent. To configure a knowledge base with your external database, use the conversational_search_tool.index_config to define the connection details for your Milvus or Elasticsearch instance.

Use the field_mapping in your index_config to to specify which fields from the search results are used for the title, body and optionally url of the search result

Milvus

When connecting to a Milvus instance:

Ensure the provided embedding_model_id is the one used when ingesting the documents in your index.

Additionally, ensure you use the GRPC host and port from your Milvus instance Connections will fail if you use the HTTP host or port.

spec_version: v1
kind: knowledge_base 
name: knowledge_base_name
description: >
   A description of what information this knowledge base addresses
prioritize_built_in_index: false
conversational_search_tool:
   index_config:
      - milvus:
         grpc_host: my.grpc-host.com
         grpc_port: "1234"
         database: database-name
         collection: collection-name
         index: index-name
         embedding_model_id: ibm/slate-125m-english-rtrvr
         filter: <filter for search>
         field_mapping:
            title: title-field
            body: text-field
            url: url-field

ElasticSearch

For Elasticsearch, you can provide a custom query_body that will be sent as the POST body in the search request. This allows for advanced query customization.

  • If provided, the query_body must include the $QUERY token, which will be replaced by the user’s query at runtime.
  • If no custom query_body is provided, a keyword search will be used.

To further customize the ElasticSearch query, result_filter can be set to an array of ElasticSearch filters. If using both query_body and result_filter, the query_body must include the $FILTER token, which will be replaced by the result_filter array at runtime.

spec_version: v1
kind: knowledge_base 
name: knowledge_base_name
description: >
   A description of what information this knowledge base addresses
prioritize_built_in_index: false
conversational_search_tool:
   index_config:
      - elastic_search:
         url: https://my.elasticsearch-instance.com
         index: my-index-name
         port: "1234"
         query_body: {"size":10,"query":{"bool":{"should":[{"text_expansion":{"ml.tokens":{"model_id":".elser_model_2_linux-x86_64","model_text":"$QUERY"}}}],"filter":"$FILTER"}}}
         result_filter: [{"match":{"title":"A_keyword_in_title"}},{"match":{"text":"A_keyword_in_text"}},{"match":{"id":"A_specific_ID"}}]
         field_mapping:
            title: title-field
            body: text-field
            url: url-field

For more information about ElasticSearch query body and filters customizations, see How to configure the advanced Elasticsearch settings

Custom search engine

You can create knowledge bases for your own custom search engine by following these examples:

spec_version: v1
kind: knowledge_base 
name: knowledge_base_name
description: >
   A description of what information this knowledge base addresses
prioritize_built_in_index: false
conversational_search_tool:
   index_config:
      - custom_search:
         url: https://my.custom-server.com
         filter: my custom filter
         metadata:
            foo: bar

Configuring generation options

With the ADK, you can further fine-tune how your agent uses knowledge through the conversational_search_tool configuration in your knowledge base.

You can apply these settings to both built-in Milvus knowledge bases and external knowledge bases. Below are the configurable options available within the conversational_search_tool section:

ConfigurationDescription
prompt_instructionSet this under generation. If specified, this instruction will be included in the prompt sent to the language model to guide response generation.
generated_response_lengthSet this under generation to one of Concise, Moderate or Verbose. This setting adjusts the prompt to request responses of the specified length. If not set, the default is Moderate.
retrieval_confidence_thresholdSet this under confidence_thresholds to one of Lowest, Low, High or Highest. This threshold determines the minimum confidence required that the retrieved documents answer the user’s query. If the confidence is below the threshold, the agent will return a default “I don’t know” response instead of generating a response. The default is “Low”.
response_confidence_thresholdSet this under confidence_thresholds to one of Lowest, Low, High or Highest. This threshold evaluates the confidence that both the generated response and the retrieved documents answer the user’s query. If the confidence is below the threshold, the agent will return a default “I don’t know” response. The default is Low.
query_rewriteIf enabled, the user’s query is rewritten using the context of the conversation to support multi-turn interactions. This setting is enabled by default.
citations_shownSet this under citations. This controls the maximum number of citations shown to the user in a knowledge-based response. If not set, the default is -1, which means all available citations will be displayed
spec_version: v1
kind: knowledge_base 
name: knowledge_base_name
description: >
   A description of what information this knowledge base addresses
documents:
   - "/file-path-1.pdf",
   - "relative-path/file-path-2.pdf"
conversational_search_tool:
   generation:
      prompt_instruction: Custom instruction
      generated_response_length: Moderate
   confidence_thresholds:
      retrieval_confidence_threshold: Low
      response_confidence_threshold: Low
   query_rewrite:
      enabled: True
   citations:
      citations_shown: -1

Configuring the Hate, Abuse, and Profanity (HAP) filter

A Hate, Abuse, and Profanity (HAP) filter, is a feature that helps maintain an inclusive environment by identifying and addressing hate speech, abuse, and profanity. This filter is used to provide a positive online atmosphere and a safe community for users. It filters content to prevent the generation of hate speech, abuse, and profanity, and provides a generic fallback response if such content is detected.

You can configure HAP settings for your knowledge bases by using the enabled and threshold parameters. You must set both parameters under conversational_search_tool > hap_filtering > output in the knowledge base schema.

ParameterDescription
enabledTurn HAP on or off in your knowledge base. Set it to true to enable HAP filtering, or false to disable it.
thresholdSet how sensitive the HAP filter is. Use a value between 0 and 1:
  • Closer to 0: The filter is stricter. More content gets flagged and might return a generic fallback response more often.
  • Closer to 1: The filter is more lenient. Fewer responses might get flagged as HAP.
spec_version: v1
kind: knowledge_base
name: ibm_knowledge_base
description: General information about IBM and its history
documents:
 - IBM_wikipedia.pdf
 - history_of_ibm.pdf
conversational_search_tool:
  hap_filtering:
    output:
      enabled: true
      threshold: 0.5