Pre-requisites
Run the following command to enable watsonx Orchestrate Developer Edition to process documents:BASH
Note:
You need to configure a minimum allocation of 20GB RAM to your Docker engine during installation of watsonx Orchestrate Developer edition to support document processing features.
Note:
To run the document field extractor, you must define the
WATSONX_SPACE_ID, WATSONX_APIKEY, and WATSONX_PROJECT_ID credentials in your .env file. For more information on configuring the .env file, see Installing the watsonx Orchestrate Developer Edition.Configuring document extractor node in agentic workflows
-
Define the fields to extract.
Create a class that defines the fields you want to extract. Each field must follow this structure:
Class example:PythonPython
- Configure the document extract node
docext() method to extract an field from a document. This method accepts the following input arguments:
Unique identifier for the node.
The LLM used for field extraction. The default value is
watsonx/meta-llama/llama-3-2-90b-vision-instruct.Display name for the node.
The fields you want to extract.
Description of the node.
Define input mappings using a structured collection of Assignment objects.
Enable the handwritten feature by setting this to
true.The minimum acceptable confidence for an extracted field value.
The fields that require user review.
Enables or disables the human-in-the-loop feature. Set to
True to activate it and False to deactivate. The default value is False.Selects the Document Extractor runtime. The default value is
classic, which uses the Unstructured Document Extractor. To use the Structured Document Extractor, set the value to layout.Note:The
min_confidence and review_fields settings control the human-in-the-loop feature. This feature only works when you run the Flow from a chat session.
If a field is extracted with confidence lower than min_confidence, and its name appears in review_fields, the agent opens a review window in the chat. You can then review and confirm the extracted values.docext node is expected to be of type DocExtInput, from the module ibm_watsonx_orchestrate.flow_builder.types.
Example use of the docext node in a agentic workflow:
Python

