A package to support Indico IPA development
Project description
Indico-Toolkit
A library to assist Indico IPA development
Available Functionality
The indico-toolkit provides classes and functions to help achieve the following:
- Easy batch workflow submission and retrieval.
- Classes that simplify dataset/doc-extraction functionality.
- Tools to assist with positioning, e.g. row association, distance between preds, relative position validation.
- Tools to assist with creating and copying workflow structures.
- Get metrics for all model IDs in a model group to see how well fields are performing after more labeling.
- Compare two models via bar plot and data tables.
- Train a document classification model without labeling.
- An AutoReview class to assist with automated acceptance/rejection of model predictions.
- Common manipulation of prediction/workflow results.
- Objects to simplify parsing OCR responses.
- Snapshot merging and manipulation
Installation
pip install indico_toolkit
- Note: if you are on Indico 6.X, install an indico_toolkit 6.X version. If you're on 5.X install a 2.X version.
- Note: If you are on a version of the Indico IPA platform pre-5.1, then install indico-toolkit==1.2.3.
Example Useage
For scripted examples on how to use the toolkit, see the examples directory
Tests
To run the test suite you will need to set the following environment variables: HOST_URL, API_TOKEN_PATH. You can also set WORKFLOW_ID (workflow w/ single extraction model), MODEL_NAME (extraction model name) and DATASET_ID (uploaded dataset). If you don't set these 3 env variables, test configuration will upload a dataset and create a workflow.
pytest
Example
How to get prediction results and write the results to CSV
from indico_toolkit.indico_wrapper import Workflow
from indico_toolkit.pipelines import FileProcessing
from indico_toolkit import create_client
WORKFLOW_ID = 1418
HOST = "app.indico.io"
API_TOKEN_PATH = "./indico_api_token.txt"
# Instantiate the workflow class
client = create_client(HOST, API_TOKEN_PATH)
wflow = Workflow(client)
# Collect files to submit
fp = FileProcessing()
fp.get_file_paths_from_dir("./datasets/disclosures/")
# Submit documents, await the results and write the results to CSV in batches of 10
for paths in fp.batch_files(batch_size=10):
submission_ids = wflow.submit_documents_to_workflow(WORKFLOW_ID, paths)
submission_results = wflow.get_submission_results_from_ids(submission_ids)
for filename, result in zip(paths, submission_results):
result.predictions.to_csv("./results.csv", filename=filename, append_if_exists=True)
Contributing
If you are adding new features to Indico Toolkit, make sure to:
- Add robust integration and unit tests.
- Add a sample usage script to the 'examples/' directory.
- Add a bullet point for what the feature does to the list at the top of this README.md.
- Ensure the full test suite is passing locally before creating a pull request.
- Add doc strings for methods where usage is non-obvious.
- If you are using new pip installed libraries, make sure they are added to the setup.py and pyproject.toml.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file indico_toolkit-6.1.0.tar.gz
.
File metadata
- Download URL: indico_toolkit-6.1.0.tar.gz
- Upload date:
- Size: 586.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.31.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c5e01227e92c100ab324aca7bebf7702f45d6012848db1351cd47b684b1060e |
|
MD5 | 9c6ea8ccad331afcb8bd52a86fa528b3 |
|
BLAKE2b-256 | ec7a9c5d9f27df8f9b939fd2c97dc55db073af573d8ee0951374c359e6448452 |
File details
Details for the file indico_toolkit-6.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: indico_toolkit-6.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 56.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.31.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1914366c851b70be7a25f6f2f7befec85086aa95f2682ac15539430fbd47010 |
|
MD5 | 3b638b3d91820c381e6fd5199403b915 |
|
BLAKE2b-256 | e7eb4db022eb54a76aa4600d73aba5dcdc6c3a8a4a8774539e0823e6842a5c80 |