Konfuzio Software Development Kit
Project description
Konfuzio SDK
The Konfuzio Software Development Kit (Konfuzio SDK) provides a Python API to interact with the Konfuzio Server.
Features
The SDK allows you to retrieve visual and text features to build your own document models. Konfuzio Server serves as an UI to define the data structure, manage training/test data and to deploy your models as API.
Function | Public Host Free* | On-Site (Paid) |
---|---|---|
OCR Text | :heavy_check_mark: | :heavy_check_mark: |
OCR Handwriting | :heavy_check_mark: | :heavy_check_mark: |
Text Annotation | :heavy_check_mark: | :heavy_check_mark: |
PDF Annotation | :heavy_check_mark: | :heavy_check_mark: |
Image Annotation | :heavy_check_mark: | :heavy_check_mark: |
Table Annotation | :heavy_check_mark: | :heavy_check_mark: |
Download HOCR | :heavy_check_mark: | :heavy_check_mark: |
Download Images | :heavy_check_mark: | :heavy_check_mark: |
Download PDF with OCR | :heavy_check_mark: | :heavy_check_mark: |
Deploy AI models | :heavy_multiplication_x: | :heavy_check_mark: |
*
Under fair use policy: We will impose 10 pages/hour throttling eventually.
:ledger: Docs | Read the docs |
:floppy_disk: Installation | How to install the Konfuzio SDK |
:mortar_board: Tutorials | See what the Konfuzio SDK can do with our Notebooks & Scripts |
:bulb: Explanations | Here are links to teaching material about the Konfuzio SDK. |
:gear: API Reference | Python classes, methods, and functions |
:heart: Contributing | Learn how to contribute! |
:bug: Issue Tracker | Report and monitor Konfuzio SDK issues |
:telescope: Changelog | Review the release notes |
:newspaper: MIT License | Review the license |
Installation
As developer register on our public HOST for free: https://app.konfuzio.com
Then you can use pip to install Konfuzio SDK and run init:
pip install konfuzio_sdk
konfuzio_sdk init
The init will create a Token to connect to the Konfuzio Server. This will create variables KONFUZIO_USER
,
KONFUZIO_TOKEN
and KONFUZIO_HOST
in an .env
file in your working directory.
By default, the SDK is installed without the AI-related dependencies like torch
or transformers
and allows for using
only the Data-related SDK concepts but not the AI models. To install the SDK with the AI components,
run the following command:
pip install konfuzio_sdk[ai]
Find the full installation guide here or setup PyCharm as described here.
CLI
We provide the basic function to create a new Project via CLI:
konfuzio_sdk create_project YOUR_PROJECT_NAME
You will see "Project {YOUR_PROJECT_NAME}
(ID {YOUR_PROJECT_ID}
) was created successfully!" printed.
And download any project via the id:
konfuzio_sdk export_project YOUR_PROJECT_ID
Tutorials
You can find detailed examples about how to set up and run document AI pipelines in our Tutorials, including:
- Split a file into separate Documents
- Document Categorization
- Train a Konfuzio SDK Model to Extract Information From Payslip Documents
Basics
Here we show how to use the Konfuzio SDK to retrieve data hosted on a Konfuzio Server instance.
from konfuzio_sdk.data import Project, Document
# Initialize the Project
YOUR_PROJECT_ID: int
my_project = Project(id_=YOUR_PROJECT_ID)
# Get any Document online
DOCUMENT_ID_ONLINE: int
doc: Document = my_project.get_document_by_id(DOCUMENT_ID_ONLINE)
# Get the Annotations in a Document
doc.annotations()
# Filter Annotations by Label
MY_OWN_LABEL_NAME: str
label = my_project.get_label_by_name(MY_OWN_LABEL_NAME)
doc.annotations(label=label)
# Or get all Annotations that belong to one Category
YOUR_CATEGORY_ID: int
category = my_project.get_category_by_id(YOUR_CATEGORY_ID)
label.annotations(categories=[category])
# Force a Project update. To save time Documents will only be updated if they have changed.
my_project.get(update=True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file konfuzio_sdk-0.2.29.dev20230808212156.tar.gz
.
File metadata
- Download URL: konfuzio_sdk-0.2.29.dev20230808212156.tar.gz
- Upload date:
- Size: 143.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82aa08ded7998323882e98ec7e97ef1a7dc482cf57e75be6bc8a15e85013faa8 |
|
MD5 | 607cfc4e62cf9f825f59903cfb5f1f4e |
|
BLAKE2b-256 | c76308bc43d821d2f27746aee315ffc6db4a5da7e5a715bd5f21a9d7d5e69356 |
File details
Details for the file konfuzio_sdk-0.2.29.dev20230808212156-py3-none-any.whl
.
File metadata
- Download URL: konfuzio_sdk-0.2.29.dev20230808212156-py3-none-any.whl
- Upload date:
- Size: 150.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a95ccef14dd1a830b7dc3d1858e1ff2b6ce1b599ad2540e05684eaaa673457c3 |
|
MD5 | 48f47fe82fa08e97c8eaefb814af34ed |
|
BLAKE2b-256 | a03602b38eab63a57dea90c553644eefe632e12b8cec2969d3b0029e27d6745c |