Pre-requisites
Run the following command to enable watsonx Orchestrate Developer Edition to process documents:BASH
Note:
You need to configure a minimum allocation of 20GB RAM to your Docker engine during installation of watsonx Orchestrate Developer edition to support document processing features.
Note:
To run the document field extractor, you must define the
WATSONX_SPACE_ID, WATSONX_APIKEY, and WATSONX_PROJECT_ID credentials in your .env file. For more information on configuring the .env file, see Installing the watsonx Orchestrate Developer Edition.Configuring document extractor node in agentic workflows
-
Define the fields to extract
Create a class that defines the fields you want to extract. Each field must follow this structure:
Class example:PythonPython
- Configure the document extract node
docext() method to extract an field from a document. This method accepts the following input arguments:
| Parameter | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | Unique identifier for the node. |
| llm | string | Yes | The LLM used for field extraction. |
| display_name | string | No | Display name for the node. |
| fields | object | Yes | The fields you want to extract. |
| description | string | No | Description of the node. |
| input_map | DataMap | No | Define input mappings using a structured collection of Assignment objects. |
| enable_hw | bool | No | Enable the handwritten feature by setting this to true. |
| min_confidence | float | No | The minimum acceptable confidence for an extracted field value. |
| review_fields | List[string] | No | The fields that require user review. |
| enable_review | bool | No | Enables or disables the human-in-the-loop feature. Set to True to activate it and False to deactivate. The default value is False. |
Note:The
min_confidence and review_fields settings control the human-in-the-loop feature. This feature only works when you run the Flow from a chat session.
If a field is extracted with confidence lower than min_confidence, and its name appears in review_fields, the agent opens a review window in the chat. You can then review and confirm the extracted values.docext node is expected to be of type DocExtInput, from the module ibm_watsonx_orchestrate.flow_builder.types.
Example use of the docext node in a agentic workflow:
Python

