> ## Documentation Index
> Fetch the complete documentation index at: https://developer.watson-orchestrate.ibm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Document classifier node

Use the document classifier node to classify your documents.

## Pre-requisites

Run the following command to enable watsonx Orchestrate Developer Edition to process documents:

```bash BASH theme={null}
orchestrate server start -e <.env file path> -d
```

<Note>
  **Note:**
  You need to configure a minimum allocation of 20GB RAM to your Docker engine during installation of watsonx Orchestrate Developer edition to support document processing features.
</Note>

<Note>
  **Note:**
  To run the document classifier, you must define the `WO_INSTANCE`, `WO_API_KEY`, and `AUTHORIZATION_URL` credentials in your `.env` file. For more information on configuring the `.env` file, see [Installing the watsonx Orchestrate Developer Edition](../../developer_edition/wxOde_setup).
</Note>

## Configuring document extractor node in agentic workflows

1. Define document classes.
   Create a class that defines the document classes to classify. Each document class must follow this structure:

   ```py Python theme={null}
   class CustomClasses(BaseModel):
   """
   Configuration schema for document classification classes.

   Defines the document types/classes that the classifier can identify.
   Each class is configured with a DocClassifierClass that specifies the
   class name used for categorizing input documents. The classifier uses
   an LLM to analyze documents and assign them to one of these classes.

   Example custom classes:
       invoice: Configuration for identifying invoice documents
       contract: Configuration for identifying contract documents
       tax_form: Configuration for identifying tax form documents
       bill_of_lading: Configuration for identifying bill of lading documents
   """
   invoice: DocClassifierClass = Field(default=DocClassifierClass(class_name="Invoice"))
   contract: DocClassifierClass = Field(default=DocClassifierClass(class_name="Contract"))
   tax_form: DocClassifierClass = Field(default=DocClassifierClass(class_name="TaxForm"))
   bill_of_lading: DocClassifierClass = Field(default=DocClassifierClass(class_name="BillOfLading"))
   ```

2. Configure the document classifier node

Include a call to the `docclassifier()` method in your agentic workflow to classify the document. This method accepts the following input arguments:

| Parameter       | Type    | Required | Description                                                                                                                              |
| --------------- | ------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| name            | string  | Yes      | Unique identifier for the node.                                                                                                          |
| llm             | string  | Yes      | The LLM used for document classification. The default value is `groq/openai/gpt-oss-120b`.                                               |
| display\_name   | string  | No       | Display name for the node.                                                                                                               |
| classes         | object  | Yes      | The document classification classes.                                                                                                     |
| description     | string  | No       | Description of the node.                                                                                                                 |
| min\_confidence | float   | No       | Minimum confidence threshold for classification.                                                                                         |
| input\_map      | DataMap | No       | Define input mappings using a structured collection of Assignment objects.                                                               |
| enable\_review  | bool    | No       | Enables or disables the human-in-the-loop feature. Set to `True` to activate it and `False` to deactivate. The default value is `False`. |

<Note>
  **Note:**

  The `min_confidence` setting controls the human-in-the-loop feature. This feature only works when you run the Flow from a chat session.
  If the document is classified with confidence lower than `min_confidence`, or as `Other`, the agent opens a review window in the chat. You can then review and confirm the extracted values.
</Note>

Example use of the `docext` node in a agentic workflow:

```py Python [expandable] theme={null}
from pydoc import Doc
from pydantic import BaseModel, Field
from ibm_watsonx_orchestrate.flow_builder.flows import (
    Flow, flow, START, END
)
from ibm_watsonx_orchestrate.flow_builder.types import DocClassifierClass, DocumentProcessingCommonInput, DocumentClassificationResponse


class CustomClasses(BaseModel):
    """
    Configuration schema for document classification classes.
    
    Defines the document types/classes that the classifier can identify.
    Each class is configured with a DocClassifierClass that specifies the
    class name used for categorizing input documents. The classifier uses
    an LLM to analyze documents and assign them to one of these classes.
    
    Example custom classes:
        invoice: Configuration for identifying invoice documents
        contract: Configuration for identifying contract documents
        tax_form: Configuration for identifying tax form documents
        bill_of_lading: Configuration for identifying bill of lading documents
    """
    invoice: DocClassifierClass = Field(default=DocClassifierClass(class_name="Invoice"))
    contract: DocClassifierClass = Field(default=DocClassifierClass(class_name="Contract"))
    tax_form: DocClassifierClass = Field(default=DocClassifierClass(class_name="TaxForm"))
    bill_of_lading: DocClassifierClass = Field(default=DocClassifierClass(class_name="BillOfLading"))


@flow(
    name ="custom_flow_docclassifier_example",
    display_name="custom_flow_docclassifier_example",
    description="Classifies documents into custom classes.",
    input_schema=DocumentProcessingCommonInput
)
def build_docclassifier_flow(aflow: Flow = None) -> Flow:
    # aflow.docclassifier returns a DocClassifierNode object.
    # The output schema of a DocClassifierNode is a DocumentClassifierResponse object.

    doc_classifier_node = aflow.docclassifier(
        name="document_classifier_node",
        display_name="document_classifier_node",
        description="Classifies documents into one custom class.",
        llm="groq/openai/gpt-oss-120b",
        classes=CustomClasses(),
    )

    aflow.sequence(START, doc_classifier_node, END)

    return aflow

```
