Enabling document processing in watsonx Orchestrate Developer Edition

Run the following command to enable watsonx Orchestrate Developer Edition to process documents:

[BASH]
orchestrate server start -e <.env file path> -d

Configuring document processing in flows

In your flow, include a call to the ‘docproc()’ method to process a document. This method accepts the following input arguments:

ParameterTypeRequiredDescription
namestringYesUnique identifier for the node.
taskstringYesSpecifies which information is extracted from the document upon processing; supported values are:
  • text_extraction: Extracts plain text from documents.
  • kvp_invoices_extraction: Extracts structured fields from invoices.
  • kvp_utility_bills_extraction: Extracts structured fields from utility bills.
display_namestringNoDisplay name for the node.
descriptionstringNoDescription of the node.

The input to a docproc node is expected to be of type DocumentContent, from the module ibm_watsonx_orchestrate.flow_builder.types.

Example use of the docproc node in a flow:

[Python]
from ibm_watsonx_orchestrate.flow_builder.flows import (
    Flow, flow, START, END
)

from ibm_watsonx_orchestrate.flow_builder.types import DocumentContent


@flow(
    name ="text_extraction_flow_example",
    display_name="text_extraction_flow_example",
    description="This flow consists of one node: a docproc node, which extracts text from the input document",
    input_schema=DocumentContent
)
def build_docproc_flow(aflow: Flow = None) -> Flow:
    doc_proc_node = aflow.docproc(
        name="text_extraction",
        display_name="text_extraction",
        description="Extract text out of a document's contents.",
        task="text_extraction"
    )

    aflow.sequence(START, doc_proc_node, END)
    return aflow

You can find more examples in the flow_builder folder, such as invoice_extraction, text_extraction, and utilities_bill_extraction.