Skip to main content

Konfuzio Software Development Kit

Project description

Konfuzio SDK

Konfuzio Downloads

The Konfuzio Software Development Kit (Konfuzio SDK) provides a Python API to interact with the Konfuzio Server.

Features

The SDK allows you to retrieve visual and text features to build your own document models. Konfuzio Server serves as an UI to define the data structure, manage training/test data and to deploy your models as API.

Function Public Host Free* On-Site (Paid)
OCR Text ✔️ ✔️
OCR Handwriting ✔️ ✔️
Text Annotation ✔️ ✔️
PDF Annotation ✔️ ✔️
Image Annotation ✔️ ✔️ ️
Table Annotation ✔️ ✔️
Download Images ✔️ ✔️
Download PDF with OCR ✔️ ✔️
Deploy AI models ✖️ ✔️

* Under fair use policy: We will impose 10 pages/hour throttling eventually.

📒 Docs Read the docs
💾 Installation How to install the Konfuzio SDK
🎓 Tutorials See what the Konfuzio SDK can do with our tutorials
💡 Explanations Here are links to teaching material about the Konfuzio SDK.
⚙️ API Reference Python classes, methods, and functions
🐛 Issue Tracker Report Konfuzio SDK issues
🔭 Changelog Review the release notes
📰 MIT License Review the license in the section below

Installation

As developer register on our public HOST for free: https://app.konfuzio.com

Then you can use pip to install Konfuzio SDK and run init:

pip install konfuzio_sdk

konfuzio_sdk init

The init will create a Token to connect to the Konfuzio Server. This will create variables KONFUZIO_USER, KONFUZIO_TOKEN and KONFUZIO_HOST in an .env file in your working directory.

By default, the SDK is installed without the AI-related dependencies like torch or transformers and allows for using only the Data-related SDK concepts but not the AI models. To install the SDK with the AI components, run the following command:

pip install konfuzio_sdk[ai]

Find the full installation guide here. To configure a PyCharm setup, follow the instructions here.

CLI

We provide the basic function to create a new Project via CLI:

konfuzio_sdk create_project YOUR_PROJECT_NAME

You will see "Project {YOUR_PROJECT_NAME} (ID {YOUR_PROJECT_ID}) was created successfully!" printed.

And download any project via the id:

konfuzio_sdk export_project YOUR_PROJECT_ID

Tutorials

You can find detailed examples about how to set up and run document AI pipelines in our Tutorials, including:

Basics

Here we show how to use the Konfuzio SDK to retrieve data hosted on a Konfuzio Server instance.

from konfuzio_sdk.data import Project, Document

# Initialize the Project
YOUR_PROJECT_ID: int
my_project = Project(id_=YOUR_PROJECT_ID)

# Get any online Document
DOCUMENT_ID_ONLINE: int
doc: Document = my_project.get_document_by_id(DOCUMENT_ID_ONLINE)

# Get the Annotations in a Document
doc.annotations()

# Filter Annotations by Label
MY_OWN_LABEL_NAME: str
label = my_project.get_label_by_name(MY_OWN_LABEL_NAME)
doc.annotations(label=label)

# Or get all Annotations that belong to one Category
YOUR_CATEGORY_ID: int
category = my_project.get_category_by_id(YOUR_CATEGORY_ID)
label.annotations(categories=[category])

# Force a Project update. To save time Documents will only be updated if they have changed.
my_project.get(update=True)

License

MIT License

Copyright (c) 2025 Helm & Nagel GmbH Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

konfuzio_sdk-0.3.31.dev20250314133902.tar.gz (243.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

konfuzio_sdk-0.3.31.dev20250314133902-py3-none-any.whl (198.8 kB view details)

Uploaded Python 3

File details

Details for the file konfuzio_sdk-0.3.31.dev20250314133902.tar.gz.

File metadata

File hashes

Hashes for konfuzio_sdk-0.3.31.dev20250314133902.tar.gz
Algorithm Hash digest
SHA256 4d3e7b1153d2a60c60dbf5b89ba1f75d5227cbaace63ae8aad18849a9bd6806e
MD5 d33264bca770002aaf2a5ed2dcddf68a
BLAKE2b-256 906bb46b0b3565b82f0dc262ad153ca6d6358604aaf0422a73db1a9d7edfaebd

See more details on using hashes here.

File details

Details for the file konfuzio_sdk-0.3.31.dev20250314133902-py3-none-any.whl.

File metadata

File hashes

Hashes for konfuzio_sdk-0.3.31.dev20250314133902-py3-none-any.whl
Algorithm Hash digest
SHA256 90b9a73a690d60acd27cb7ac3e0699dda7786387c48952c38bf8106b30dd33c6
MD5 7244dc55182adf7d4ceb7864b408a505
BLAKE2b-256 f875780b91e7dd0c542bf1906b2c106d93f5d34b26917ed9fdfe1b3e20eef006

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page