This feature is currently in public preview. Functionality and behavior may change in future updates.

Enabling document processing in watsonx Orchestrate Developer Edition

Run the following command to enable watsonx Orchestrate Developer Edition to process documents:

[BASH]
orchestrate server start -e <.env file path> -d

Configuring document processing in flows

In your flow, include a call to the ‘docproc()’ method to process a document. This method accepts the following input arguments:

ParameterTypeRequiredDescription
namestringYesUnique identifier for the node.
taskstringYesSpecifies which information is extracted from the document upon processing; supported values are:
  • text_extraction: Extracts plain text from documents.
  • kvp_invoices_extraction: Extracts structured fields from invoices.
  • kvp_utility_bills_extraction: Extracts structured fields from utility bills.
display_namestringNoDisplay name for the node.
descriptionstringNoDescription of the node.

The input to a docproc node is expected to be of type DocumentContent, from the module ibm_watsonx_orchestrate.flow_builder.types.

Example use of the docproc node in a flow:

[Python]
from ibm_watsonx_orchestrate.flow_builder.flows import (
    Flow, flow, START, END
)

from ibm_watsonx_orchestrate.flow_builder.types import DocumentContent


@flow(
    name ="text_extraction_flow_example",
    display_name="text_extraction_flow_example",
    description="This flow consists of one node: a docproc node, which extracts text from the input document",
    input_schema=DocumentContent
)
def build_docproc_flow(aflow: Flow = None) -> Flow:
    doc_proc_node = aflow.docproc(
        name="text_extraction",
        display_name="text_extraction",
        description="Extract text out of a document's contents.",
        task="text_extraction"
    )

    aflow.sequence(START, doc_proc_node, END)
    return aflow

You can find more examples in the flow_builder folder, such as invoice_extraction, text_extraction, and utilities_bill_extraction.