Creating Knowledge Bases

With the ADK, you can create a knowlege bases for your agents, either by connecting to your own ElasticSearch or Milvus instance, or by uploading your documents.

Use YAML, JSON or Python files to create your knowledge bases for watsonx Orchestrate.

Creating built-in Milvus knowledge bases

If you don’t have an existing Milvus or Elasticsearch instance to connect to, you can create a knowledge base by simply uploading your documents. These documents will be ingested into the built-in Milvus instance, which will serve as the backend for your knowledge base.

The supported documents must follow these requirements:

The maximum file size is 25 MB.
The maximum files in one batch are 20, and the maximum file size of a batch is 50 MB.
You can upload up to 50 files before moving to an external data source.
Supported file types are txt, pdf, csv, docx, xlsx, pptx and html.

Once the knowledge base is created, you can check its status to see when it’s ready for use.

spec_version: v1
kind: knowledge_base 
name: knowledge_base_name
description: >
   A description of what information this knowledge base addresses
documents:
   - "/file-path-1.pdf",
   - "relative-path/file-path-2.pdf"

Creating external knowledge bases

External knowledge bases allow you to connect your existing Milvus or Elasticsearch databases as a knowledge source for your agent. To configure a knowledge base with your external database, use the conversational_search_tool.index_config to define the connection details for your Milvus or Elasticsearch instance.

Use the field_mapping in your index_config to to specify which fields from the search results are used for the title, body and optionally url of the search result

Milvus

When connecting to a Milvus instance:

Ensure the provided embedding_model_id is the one used when ingesting the documents in your index.

Additionally, ensure you use the GRPC host and port from your Milvus instance Connections will fail if you use the HTTP host or port.

spec_version: v1
kind: knowledge_base 
name: knowledge_base_name
description: >
   A description of what information this knowledge base addresses
prioritize_built_in_index: false
conversational_search_tool:
   index_config:
      - milvus:
         grpc_host: my.grpc-host.com
         grpc_port: "1234"
         database: database-name
         collection: collection-name
         index: index-name
         embedding_model_id: ibm/slate-125m-english-rtrvr
         filter: <filter for search>
         field_mapping:
            title: title-field
            body: text-field
            url: url-field

ElasticSearch

For Elasticsearch, you can provide a custom query_body that will be sent as the POST body in the search request. This allows for advanced query customization.

If provided, the query_body must include the $QUERY token, which will be replaced by the user’s query at runtime.
If no custom query_body is provided, a keyword search will be used.

To further customize the ElasticSearch query, result_filter can be set to an array of ElasticSearch filters. If using both query_body and result_filter, the query_body must include the $FILTER token, which will be replaced by the result_filter array at runtime.

spec_version: v1
kind: knowledge_base 
name: knowledge_base_name
description: >
   A description of what information this knowledge base addresses
prioritize_built_in_index: false
conversational_search_tool:
   index_config:
      - elastic_search:
         url: https://my.elasticsearch-instance.com
         index: my-index-name
         port: "1234"
         query_body: {"size":10,"query":{"bool":{"should":[{"text_expansion":{"ml.tokens":{"model_id":".elser_model_2_linux-x86_64","model_text":"$QUERY"}}}],"filter":"$FILTER"}}}
         result_filter: [{"match":{"title":"A_keyword_in_title"}},{"match":{"text":"A_keyword_in_text"}},{"match":{"id":"A_specific_ID"}}]
         field_mapping:
            title: title-field
            body: text-field
            url: url-field

For more information about ElasticSearch query body and filters customizations, see How to configure the advanced Elasticsearch settings

Custom search engine

You can create knowledge bases for your own custom search engine by following these examples:

spec_version: v1
kind: knowledge_base 
name: knowledge_base_name
description: >
   A description of what information this knowledge base addresses
prioritize_built_in_index: false
conversational_search_tool:
   index_config:
      - custom_search:
         url: https://my.custom-server.com
         filter: my custom filter
         metadata:
            foo: bar

Configuring generation options

With the ADK, you can further fine-tune how your agent uses knowledge through the conversational_search_tool configuration in your knowledge base.

You can apply these settings to both built-in Milvus knowledge bases and external knowledge bases. Below are the configurable options available within the conversational_search_tool section:

Configuration	Description
`prompt_instruction`	Set this under `generation`. If specified, this instruction will be included in the prompt sent to the language model to guide response generation.
`generated_response_length`	Set this under `generation` to one of `Concise`, `Moderate` or `Verbose`. This setting adjusts the prompt to request responses of the specified length. If not set, the default is `Moderate`.
`retrieval_confidence_threshold`	Set this under `confidence_thresholds` to one of `Lowest`, `Low`, `High` or `Highest`. This threshold determines the minimum confidence required that the retrieved documents answer the user’s query. If the confidence is below the threshold, the agent will return a default “I don’t know” response instead of generating a response. The default is “Low”.
`response_confidence_threshold`	Set this under `confidence_thresholds` to one of `Lowest`, `Low`, `High` or `Highest`. This threshold evaluates the confidence that both the generated response and the retrieved documents answer the user’s query. If the confidence is below the threshold, the agent will return a default “I don’t know” response. The default is `Low`.
`query_rewrite`	If enabled, the user’s query is rewritten using the context of the conversation to support multi-turn interactions. This setting is enabled by default.
`citations_shown`	Set this under `citations`. This controls the maximum number of citations shown to the user in a knowledge-based response. If not set, the default is `-1`, which means all available citations will be displayed

spec_version: v1
kind: knowledge_base 
name: knowledge_base_name
description: >
   A description of what information this knowledge base addresses
documents:
   - "/file-path-1.pdf",
   - "relative-path/file-path-2.pdf"
conversational_search_tool:
   generation:
      prompt_instruction: Custom instruction
      generated_response_length: Moderate
   confidence_thresholds:
      retrieval_confidence_threshold: Low
      response_confidence_threshold: Low
   query_rewrite:
      enabled: True
   citations:
      citations_shown: -1

Configuring the Hate, Abuse, and Profanity (HAP) filter

A Hate, Abuse, and Profanity (HAP) filter, is a feature that helps maintain an inclusive environment by identifying and addressing hate speech, abuse, and profanity. This filter is used to provide a positive online atmosphere and a safe community for users. It filters content to prevent the generation of hate speech, abuse, and profanity, and provides a generic fallback response if such content is detected.

You can configure HAP settings for your knowledge bases by using the enabled and threshold parameters. You must set both parameters under conversational_search_tool > hap_filtering > output in the knowledge base schema.

Parameter	Description
`enabled`	Turn HAP on or off in your knowledge base. Set it to `true` to enable HAP filtering, or `false` to disable it.
`threshold`	Set how sensitive the HAP filter is. Use a value between 0 and 1: Closer to 0: The filter is stricter. More content gets flagged and might return a generic fallback response more often. Closer to 1: The filter is more lenient. Fewer responses might get flagged as HAP.

spec_version: v1
kind: knowledge_base
name: ibm_knowledge_base
description: General information about IBM and its history
documents:
 - IBM_wikipedia.pdf
 - history_of_ibm.pdf
conversational_search_tool:
  hap_filtering:
    output:
      enabled: true
      threshold: 0.5

Release Notes

Get Started

Environments

Agents

Tools and Connections

Large Language Models (LLMs)

Knowledge Bases

Webchats

Tutorials

API's reference

Legal notices

Creating built-in Milvus knowledge bases

Creating external knowledge bases

Milvus

ElasticSearch

Custom search engine

Configuring generation options

Configuring the Hate, Abuse, and Profanity (HAP) filter

Release Notes

Get Started

Environments

Agents

Tools and Connections

Large Language Models (LLMs)

Knowledge Bases

Webchats

Tutorials

API's reference

Legal notices

​Creating built-in Milvus knowledge bases

​Creating external knowledge bases

​Milvus

​ElasticSearch

​Custom search engine

​Configuring generation options

​Configuring the Hate, Abuse, and Profanity (HAP) filter

Creating built-in Milvus knowledge bases

Creating external knowledge bases

Milvus

ElasticSearch

Custom search engine

Configuring generation options

Configuring the Hate, Abuse, and Profanity (HAP) filter