Document classifier node (Public preview)

Use the document classifier node to classify your documents.

This feature is currently in public preview. Functionality and behavior may change in future updates.

Pre-requisites

Run the following command to enable watsonx Orchestrate Developer Edition to process documents:

BASH

orchestrate server start -e <.env file path> -d

Note: You need to configure a minimum allocation of 20GB RAM to your Docker engine during installation of watsonx Orchestrate Developer edition to support document processing features.

Note: To run the document classifier, you must define the WO_INSTANCE, WO_API_KEY, and AUTHORIZATION_URL credentials in your .env file. For more information on configuring the .env file, see Installing the watsonx Orchestrate Developer Edition.

Configuring document extractor node in agentic workflows

Define document classes Create a class that defines the document classes to classify. Each document class must follow this structure:
Python
```
class CustomClasses(BaseModel):
    invoice: DocClassifierClass = Field(default=DocClassifierClass(class_name="Invoice"))
```
Configure the document extract node

Include a call to the docclassifier() method in your agentic workflow to classify the document. This method accepts the following input arguments:

Parameter	Type	Required	Description
name	string	Yes	Unique identifier for the node.
llm	string	Yes	The LLM used for document classification.
display_name	string	No	Display name for the node.
classes	object	Yes	The document classification classes.
description	string	No	Description of the node.
min_confidence	float	No	Minimum confidence threshold for classification.
review_fields	List[string]	No	The fields that require user review.
input_map	DataMap	No	Define input mappings using a structured collection of Assignment objects.
enable_review	bool	No	Enables or disables the human-in-the-loop feature. Set to `True` to activate it and `False` to deactivate. The default value is `False`.

Note:The min_confidence and review_fields settings control the human-in-the-loop feature. This feature only works when you run the Flow from a chat session. If a field is extracted with confidence lower than min_confidence, and its name appears in review_fields, the agent opens a review window in the chat. You can then review and confirm the extracted values.

Example use of the docext node in a agentic workflow:

Python

from pydantic import BaseModel, Field
from ibm_watsonx_orchestrate.flow_builder.flows import (
    Flow, flow, START, END
)
from ibm_watsonx_orchestrate.flow_builder.types import DocClassifierClass, DocumentProcessingCommonInput, DocumentClassificationResponse


class CustomClasses(BaseModel):
    buyer: DocClassifierClass = Field(default=DocClassifierClass(class_name="Buyer"))
    seller: DocClassifierClass = Field(default=DocClassifierClass(class_name="Seller"))
    agreement_date: DocClassifierClass = Field(default=DocClassifierClass(class_name="Agreement_Date"))


@flow(
    name ="custom_flow_docclassifier_example",
    display_name="custom_flow_docclassifier_example",
    description="Extraction of custom classes from a document, specified by the user.",
    input_schema=DocumentProcessingCommonInput
)
def build_docclassifier_flow(aflow: Flow = None) -> Flow:
    # aflow.docclassifier return a DocClassifierNode object
    # DocumentClassificationResponse is the output schema of DocClassifierNode and it can be used as input schema for the next node

    doc_classifier_node = aflow.docclassifier(
        name="document_classifier_node",
        display_name="document_classifier_node",
        description="Classify custom classes from a document",
        llm="watsonx/meta-llama/llama-3-2-90b-vision-instruct",
        classes=CustomClasses(),
    )

    aflow.sequence(START, doc_classifier_node, END)
    return aflow

Release Notes

Get Started

Build

Analyze

watsonx Orchestrate Developer Edition

watsonx Orchestrate MCP Server

Reference

Legal notices

Pre-requisites

Configuring document extractor node in agentic workflows

Release Notes

Get Started

Build

Analyze

watsonx Orchestrate Developer Edition

watsonx Orchestrate MCP Server

Reference

Legal notices

​Pre-requisites

​Configuring document extractor node in agentic workflows

Pre-requisites

Configuring document extractor node in agentic workflows