Skip to main content

A package to support Indico IPA development

Project description

Indico-Toolkit

A library to assist Indico IPA development

Build Status

Available Functionality

The indico-toolkit provides classes and functions to help achieve the following:

  • Easy batch workflow submission and retrieval.
  • Classes that simplify dataset/doc-extraction functionality.
  • Row and line item association.
  • Staggered loop learning retrieval and reformatting.
  • Train a document classification model without labeling.
  • Train a first page classification model (for bundle splitting) without labeling.
  • Helpful Scripted/Auto Review processing and submission.
  • Common manipulation of prediction/workflow reuslts.
  • Objects to simplify parsing OCR responses.
  • Finder class to quicky obtain associated model/dataset/workflow Ids.
  • Class to spoof a human reviewer.

Example Useage

For scripted examples on how to use the toolkit, see the examples directory

Tests

To run the test suite you will need to set the following environment variables: HOST_URL, API_TOKEN_PATH. You can also set WORKFLOW_ID (workflow w/ single extraction model), MODEL_NAME (extraction model name) and DATASET_ID (uploaded dataset). If you don't set these 3 env variables, test configuration will upload a dataset and create a workflow.

pytest

To see test coverage

coverage run --omit 'venv/*' -m pytest
coverage report -m

Example

How to get prediction results and write the results to CSV

from indico_toolkit.indico_wrapper import Workflow
from indico_toolkit.pipelines import FileProcessing
from indico_toolkit import create_client

WORKFLOW_ID = 1418
HOST = "app.indico.io"
API_TOKEN_PATH = "./indico_api_token.txt"

# Instantiate the workflow class
client = create_client(HOST, API_TOKEN_PATH)
wflow = Workflow(client)

# Collect files to submit
fp = FileProcessing()
fp.get_file_paths_from_dir("./datasets/disclosures/")

# Submit documents, await the results and write the results to CSV in batches of 10
for paths in fp.batch_files(batch_size=10):
    submission_ids = wflow.submit_documents_to_workflow(WORKFLOW_ID, paths)
    submission_results = wflow.get_submission_results_from_ids(submission_ids)
    for filename, result in zip(paths, submission_results):
        result.predictions.to_csv("./results.csv", filename=filename, append_if_exists=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indico_toolkit-1.0.tar.gz (9.6 MB view details)

Uploaded Source

Built Distribution

indico_toolkit-1.0-py2.py3-none-any.whl (28.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file indico_toolkit-1.0.tar.gz.

File metadata

  • Download URL: indico_toolkit-1.0.tar.gz
  • Upload date:
  • Size: 9.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.25.1

File hashes

Hashes for indico_toolkit-1.0.tar.gz
Algorithm Hash digest
SHA256 6cf8b34f77a18c24d3c0d9efeb4bd4ea106e9255dd849be500d66eb2bb810014
MD5 332380789760261bbae5dc2e077a1fc9
BLAKE2b-256 bcc5109e42f4e47d06d6ce3fcc0cd2ce2ef6501bd4e05fda29d9ee8116d9f064

See more details on using hashes here.

File details

Details for the file indico_toolkit-1.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for indico_toolkit-1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2aecb5ef91be85e331a075911caedc4a93d521f56e6071e5f90d777a1223e6cb
MD5 6aeeec992a0817a95f16887a65faa292
BLAKE2b-256 0a6ad90f10434f393281c3153671e9c6d8c09c75bf456bfcca35201796066394

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page