Skip to main content

Official client for Siftrics' Hydra API, which is a text recognition documents-to-database service

Project description

This repository contains the official Hydra API Python client. The Hydra API is a text recognition service.

Quickstart

  1. Install the package.
pip install hydra-api

or

poetry add hydra-api

etc.

  1. Create a new data source on siftrics.com.
  2. Grab an API key from the page of your newly created data source.
  3. Create a client, passing your API key into the constructor.
  4. Use the client to processes documents, passing in the id of a data source and the filepaths of the documents.
import hydra_api

client = hydra_api.Client('xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx')

rows = client.recognize('my_data_source_id', ['invoice.pdf', 'receipt_1.png'])

rows looks like this:

[
  {
    "Error": "",
    "FileIndex": 0,
    "RecognizedText": { ... }
  },
  ...
]

FileIndex is the index of this file in the original request's "files" array.

RecognizedText is a dictionary mapping labels to values. Labels are the titles of the bounding boxes drawn during the creation of the data source. Values are the recognized text inside those bounding boxes.

Using Base64 Strings Instead of File Paths

There is another function, client.recognizeBase64(dataSourceId, base64Files, doFaster=False), which accepts base64 strings (file contents) instead of file paths. Because it is not trivial to infer MIME type from the contents of a file, you must specify the MIME type associated to each base64 file string: base64Files must be a list of dict objects containing two fields: "mimeType" and ``"base64File"`. Example:

    base64Files = [
        {
            'mimeType': 'image/png',
            'base64File': '...',
        },
        {
            'mimeType': 'application/pdf',
            'base64File': '...',
        },
    ]
    rows = client.recognizeBase64('Helm-Test-Againe', base64Files, doFaster=True)

Returning Transformed / Pre-Processed Images

Hydra can transform input documents so they are cropped and aligned with the original image used to create the data source.

The recognize and recognizeBase64 functions have an additional default parameter, returnTransformedImages, which defaults to False, but if it's set to True then Siftrics transforms and returns images so they are aligned with the original image.

Returned images will be available in the "TransformedImages" field of each element of "Rows" in the response:

{
  "Rows": [
    {
      "Error": "",
      "FileIndex": 0,
      "RecognizedText": {
        "My Field 1": "text from your document...",
        "My Field 2": "text from your document...",
        ...
      },
      "TransformedImages": [
        {
          "Base64Image": ...,
          "PageNumber": 1
        },
        ...
      ]
    },
    ...
  ]
}

Faster Results

The recognize and recognizeBase64 functions have an additional default parameter, doFaster, which defaults to False, but if it's set to True then Siftrics processes the documents faster at the risk of lower text recognition accuracy. Experimentally, doFaster=true seems not to affect accuracy when all the documents to be processed have been rotated no more than 45 degrees.

Exporting JPGs instead of PNGs

The recognize and recognizeBase64 functions have additional default parameters, returnJpgs=False and jpgQuality=85. If returnJpgs is set to True, then Siftrics returns cropped images in JPG format instead of PNG format. jpgQuality must be an integer between 1 and 100 inclusive.

Official API Documentation

Here is the official documentation for the Hydra API.

Apache V2 License

This code is licensed under Apache V2.0. The full text of the license can be found in the "LICENSE" file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hydra-api-1.2.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hydra_api-1.2.0-py2.py3-none-any.whl (8.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file hydra-api-1.2.0.tar.gz.

File metadata

  • Download URL: hydra-api-1.2.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.3 CPython/3.8.5 Linux/4.5.1-1-ARCH

File hashes

Hashes for hydra-api-1.2.0.tar.gz
Algorithm Hash digest
SHA256 c40cbf72569d521446b1de5c27fdb24c2c0cebddd1e2787121dcbd1a18215409
MD5 4df7fd854ff1b461d1ea5fc1ade6b823
BLAKE2b-256 a20f3d734bf8b512a1e4423454dca40db912dbcd551161a9cee2393413af4cc8

See more details on using hashes here.

File details

Details for the file hydra_api-1.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: hydra_api-1.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.3 CPython/3.8.5 Linux/4.5.1-1-ARCH

File hashes

Hashes for hydra_api-1.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a906aacede6a5a8eefa021a323743c313b5f10ab5ac5b2fbce388e4e53e02f78
MD5 8fc5596e8b96f9bbc62344f503221903
BLAKE2b-256 86aebec4cbd4e0089f48b2eb95e95853275d19538bb6f3512c3417705d8bd8a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page