Skip to main content

Python SDK for Sensible, the developer-first platform for extracting structured data from documents so that you can build document-automation features into your SaaS products

Project description

Sensible Python SDK

Welcome! Sensible is a developer-first platform for extracting structured data from documents, for example, business forms in PDF format. use Sensible to build document-automation features into your SaaS products. Sensible is highly configurable: you can get simple data in minutes by leveraging GPT-4 and other large-language models (LLMs), or you can tackle complex and idiosyncratic document formatting with Sensible's powerful layout-based document primitives.

Click to enlarge

This open-source Sensible SDK offers convenient access to the Sensible API. Use this SDK to:

  • Extract: Extract structured data from your custom documents. Configure the extractions for a set of similar documents, or document type, in the Sensible app or Sensible API, then run extractions for documents of the type with this SDK.
  • Classify: Classify documents by the types you define, for example, bank statements or tax forms. Use classification to determine which documents to extract prior to calling a Sensible extraction endpoint, or route each document in a system of record.

Documentation

Versions

  • The latest version of this SDK is v0.
  • The latest version of the Sensible API is v0.

Python support

  • This SDK supports all non end-of-life versions of Python.

Install

In an environment in which you've installed Python, create a directory for a test project, open a command prompt in the directory, and install the dependencies:

pip install sensibleapi

To import Sensible to your project, create an index.py file in your test project, and add the following lines to the file:

from sensibleapi import SensibleSDK

Initialize

Get an account at sensible.so if you don't have one already.

To initialize the SDK, paste the following code into your index.py file and replace YOUR_API_KEY with your API key:

sensible = SensibleSDK("YOUR_API_KEY")

Note: Ensure you secure your API key in production, for example as a GitHub secret.

Quickstart

To extract data from a sample document at a URL:

  1. Install the Sensible SDK using the steps in the previous section.
  2. Paste the following code into an empty index.py file:
import asyncio
from sensibleapi import SensibleSDK

async def main():
  sensible = SensibleSDK("YOUR_API_KEY")  # replace with your API key
  request = await sensible.extract(
      url="https://github.com/sensible-hq/sensible-docs/raw/main/readme-sync/assets/v0/pdfs/contract.pdf",
      document_type="sensible_instruct_basics",
      environment="development"
  )
  results = await sensible.wait_for(request)  # polls every 5 seconds. Optional if you configure a webhook
  print(results)

asyncio.run(main())
  1. Replace YOUR_API_KEY with your API key.
  2. In a command prompt in the same directory as your index.py file, run the code with the following command:
python index.py

The code extracts data from an example document (contract.pdf) using an example document type (sensible_instruct_basics) and an example extraction configuration.

Results

You should see the following extracted document text in the parsed_document object in the logged response:

{
  "purchase_price": {
    "source": "$400,000",
    "value": 400000,
    "unit": "$",
    "type": "currency"
  },
  "street_address": {
    "value": "1234 ABC COURT City of SALT LAKE CITY County of Salt Lake -\nState of Utah, Zip 84108",
    "type": "address"
  }
}

Optional: Understand extraction

Navigate to the example in the SenseML editor to see how the extraction you just ran works in the Sensible app. You can add more fields to the left pane to extract more data:

Click to enlarge

Usage: Extract document data

You can use this SDK to extract data from a document, as specified by the extraction configurations and document types defined in your Sensible account.

Overview

See the following steps for an overview of the SDK's workflow for document data extraction:

  1. Instantiate an SDK object with SensibleSDK().
  2. Request a document extraction with sensible.extract(). Use the following required parameters:
    1. (required) Specify the document from which to extract data using the url, path, or file parameter.
    2. (required) Specify the user-defined document type or types using the document_type or document_types parameter.
  3. Wait for the results. Use sensible.wait_for(), or use a webhook.
  4. Optionally convert extractions to an Excel file with generate_excel().
  5. Consume the data.

Extraction configuration

You can configure options for document data extraction:

request = sensible.extract(
    path="./1040_john_doe.pdf",
    document_type="tax_forms",
    configuration_name="1040_2021",
    environment="development",
    webhook={
        "url": "YOUR_WEBHOOK_URL",
        "payload": "additional info, for example, a UUID for verification",
    }
)

See the following table for information about configuration options:

key value description
path string The path to the document you want to extract from. For more information about supported file size and types, see Supported file types.
file string The non-encoded bytes of the document you want to extract from.
url string The URL of the document you want to extract from. URL must:
- respond to a GET request with the bytes of the document you want to extract data from
- be either publicly accessible, or presigned with a security token as part of the URL path.
To check if the URL meets these criteria, open the URL with a web browser. The browser must either render the document as a full-page view with no other data, or download the document, without prompting for authentication.
document_type string Type of document to extract from. Create your custom type in the Sensible app (for example, rate_confirmation, certificate_of_insurance, or home_inspection_report), or use Sensible's library of out-of-the-box supported document types.
document_types array Types of documents to extract from. Use this parameter to extract from multiple documents that are packaged into one file (a "portfolio"). This parameter specifies the document types contained in the portfolio. Sensible then segments the portfolio into documents using the specified document types (for example, 1099, w2, and bank_statement) and then runs extractions for each document. For more information, see Multi-doc extraction.
configuration_name string If specified, Sensible uses the specified config to extract data from the document instead of automatically choosing the configuration.
If unspecified, Sensible automatically chooses the best-scoring extraction from the configs in the document type.
Not applicable for portfolios.
document_name string If you specify the file name of the document using this parameter, then Sensible returns the file name in the extraction response and populates the file name in the Sensible app's list of recent extractions.
environment "production" or "development". default: "production" If you specify development, Sensible extracts preferentially using config versions published to the development environment in the Sensible app. The extraction runs all configs in the doc type before picking the best fit. For each config, falls back to production version if no development version of the config exists.
webhook object Specifies to return extraction results to the specified webhook URL as soon as they're complete, so you don't have to poll for results status. Sensible also calls this webhook on error.
The webhook object has the following parameters:
url: string. Webhook destination. Sensible will POST to this URL when the extraction is complete.
payload: string, number, boolean, object, or array. Information additional to the API response, for example a UUID for verification.

Extraction results

Get extraction results by using a webhook or calling the Wait For method.

For the schema for the results of an extraction request, see Extract data from a document and expand the 200 responses in the middle pane and the right pane to see the model and an example, respectively.

Usage: Classify documents by type

You can use this SDK to classify a document by type, as specified by the document types defined in your Sensible account. For more information, see Classifying documents by type.

Overview

See the following steps for an overview of the SDK's workflow for document classification:

  1. Instantiate an SDK object (new SensibleSDK().

  2. Request a document classification (sensible.classify(). Specify the document to classify using the path or file parameter.

  3. Poll for the result (sensible.wait_for().

  4. Consume the data.

Classification configuration

You can configure options for document data extraction:

import asyncio
from sensibleapi import SensibleSDK

async def main():
    sensible = SensibleSDK(api_key="YOUR_API_KEY")  # Replace with your API key
    request = await sensible.classify(path="./boa_sample.pdf")
    results = await sensible.wait_for(request)
    print(results)

asyncio.run(main())

See the following table for information about configuration options:

key value description
path string The path to the document you want to classify. For information about supported file size and types, see Supported file types.
file string The non-encoded bytes of the document you want to classify.

Classification results

Get results from this method by calling the Wait For method. For the schema for the results of a classification request , see Classify document by type (sync) and expand the 200 responses in the middle pane and the right pane to see the model and an example, respectively.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sensibleapi-0.0.7.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

sensibleapi-0.0.7-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file sensibleapi-0.0.7.tar.gz.

File metadata

  • Download URL: sensibleapi-0.0.7.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for sensibleapi-0.0.7.tar.gz
Algorithm Hash digest
SHA256 6e2eecd01a1db12ad648480c06d4202ca02589e891ed20e4b7e329b63d81c632
MD5 3f01f9e634281313b3842e3da9abb937
BLAKE2b-256 088df2a8e568d4dcdbdc51b42a049cc2c9c6a71e5d03094605d0fe4d29a99f18

See more details on using hashes here.

File details

Details for the file sensibleapi-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: sensibleapi-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for sensibleapi-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 07b191b4b8aec53d06b95d24d363914c64995dccfe0e9a05da03bd90b717c2ef
MD5 1e3ac184fa619d22785f179b36a62886
BLAKE2b-256 222a8059e858825606a1722e2a5892d203405910f0a2019ca93a81f1422e9cb3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page