Skip to main content

SDK to interact with the NuMind models API.

Project description

NuMind SDK

Python SDK to interact with NuMind's models API: NuExtract and NuMarkdown.

Installation

pip install numind

Usage and code examples

Create a client

You must first get an API key on the NuExtract platform.

import os

from numind import NuMind

# Create a client object to interact with the API
# Providing the `api_key` is not required if the `NUMIND_API_KEY` environment variable
# is already set.
client = NuMind(api_key=os.environ["NUMIND_API_KEY"])

Create an async client

You can create an async client by using the NuMindAsync class:

import asyncio
from numind import NuMindAsync

client = NuMindAsync(api_key="API_KEY")
requests = [{}]

async def main():
    return [
        await client.extract_structured_data(project_id, **request_kwargs)
        for request_kwargs in requests
    ]


responses = asyncio.run(main())

The methods and their usages are the same as for the sync NuMind client except that API methods are coroutines that must be awaited.

NuExtract: Extract structured information "on the fly"

If you want to extract structured information from data without projects but just by providing the input template, you can use the extract method which provides a more user-friendly way to interact with the API:

template = {
    "destination": {
        "name": "verbatim-string",
        "zip_code": "string",
        "country": "string"
    },
    "accommodation": "verbatim-string",
    "activities": ["verbatim-string"],
    "duration": {
        "time_unit": ["day", "week", "month", "year"],
        "time_quantity": "integer"
    }
}
input_text = """My dream vacation would be a month-long escape to the stunning islands of Tahiti.
I’d stay in an overwater bungalow in Bora Bora, waking up to crystal-clear turquoise waters and breathtaking sunrises.
Days would be spent snorkeling with vibrant marine life, paddleboarding over coral gardens, and basking on pristine white-sand beaches.
I’d explore lush rainforests, hidden waterfalls, and the rich Polynesian culture through traditional dance, music, and cuisine.
Evenings would be filled with romantic beachside dinners under the stars, with the soothing sound of waves as the perfect backdrop."""

output = client.extract_structured_data(template=template, input_text=input_text)
print(output)

# Can also work with files, replace the path with your own
# from pathlib import Path
# output = client.extract(template=template, input_file="file.ppt")
{
    "destination": {
        "name": "Tahiti",
        "zip_code": "98730",
        "country": "France"
    },
    "accommodation": "overwater bungalow in Bora Bora",
    "activities": [
        "snorkeling",
        "paddleboarding",
        "basking",
        "explore lush rainforests, hidden waterfalls, and the rich Polynesian culture"
    ],
    "duration": {
        "time_unit": null,
        "time_quantity": null
    }
}

Create a good template

NuExtract uses JSON schemas as extraction templates which specify the information to retrieve and their types, which are:

  • string: a text, whose value can be abstract, i.e. totally free and can be deduced from calculations, reasoning, external knowledge;
  • verbatim-string: a purely extractive text whose value must be present in the document. Some flexibility might be allowed on the formatting, e.g. new lines and escaped characters (e.g. \n) in a documents might be represented with a space;
  • integer: an integer number;
  • number: any number, that may be a floating point number or an integer;
  • boolean: a boolean whose value should be either true or false;
  • date-time: a date or time whose value should follow the ISO 8601 standard (YYYY-MM-DDThh:mm:ss). It may feature "reduced" accuracy, i.e. omitting certain date or time components not useful in specific cases. For examples, if the extracted value is a date, YYYY-MM-DD is a valid value format. The same applies to times with the hh:mm:ss format (without omitting the leading T symbol). Additionally, the "least significant" component might be omitted if it is not required or specified. For example, a specific month and year can be specified as YYYY-MM while omitting the day component DD. A specific hour can be specified as hh while omitting the minutes and seconds components. When combining dates and time, only the least significant time components can be omitted, e.g. YYYY-MM-DDThh:mm which is omitting the seconds.

Additionally, the value of a field can be:

  • a nested dictionary, i.e. another branch, describing elements associated to their parent node (key);
  • an array of items of the form ["type"], whose values are elements of a given "type", which can also be a dictionary of unspecified depth;
  • an enum, i.e. a list of elements to choose from of the form ["choice1", "choice2", ...]. For values of this type, just set the value of the item to choose, e.g. "choice1", and do not set the value as an array containing the item such as ["choice1"];
  • a multi-enum, i.e. a list from which multiple elements can be picked, of the form [["choice1", "choice2", ...]] (double square brackets).

Inferring a template

The "infer_template" method allows to quickly create a template that you can start to work with from a text description.

from numind.openapi_client import TemplateRequest
from pydantic import StrictStr

description = "Create a template that extracts key information from an order confirmation email. The template should be able to pull details like the order ID, customer ID, date and time of the order, status, total amount, currency, item details (product ID, quantity, and unit price), shipping address, any customer requests or delivery preferences, and the estimated delivery date."
input_schema = client.post_api_infer_template(
    template_request=TemplateRequest(description=StrictStr(description))
)

Create a project

A project allows to define an information extraction task from a template and examples.

from numind.openapi_client import CreateProjectRequest

project_id = client.post_api_structured_extraction(
    CreateProjectRequest(
        name="vacation",
        description="Extraction of locations and activities",
        template=template,
    )
)

The project_id can also be found in the "API" tab of a project on the NuExtract website.

Add examples to a project to teach NuExtract via ICL (In-Context Learning)

from pathlib import Path

# Prepare examples, here a text and a file
example_1_input = "This is a text example"
example_1_expected_output = {
    "destination": {"name": None, "zip_code": None, "country": None}
}
with Path("example_2.odt").open("rb") as file:  # read bytes
    example_2_input = file.read()
example_2_expected_output = {
    "destination": {"name": None, "zip_code": None, "country": None}
}
examples = [
    (example_1_input, example_1_expected_output),
    (example_2_input, example_2_expected_output),
]

# Add the examples to the project
client.add_examples_to_structured_extraction_project(project_id, examples)

Extract structured information from text

output_schema = client.extract_structured_data(project_id, input_text=input_text)

Extract structured information from a file

from pathlib import Path

file_path = Path("document.odt")
with file_path.open("rb") as file:
    input_file = file.read()
output_schema = client.extract(project_id, input_file=input_file)

NuMarkdown: Convert a document to a RAG-ready Markdown

from pathlib import Path

file_path = Path("document.pdf")
with file_path.open("rb") as file:
    input_file = file.read()
markdown = client.extract_content(input_file)

Documentation

Extracting Information from Documents

Once your project is ready, you can use it to extract information from documents in real time via this RESTful API.

Each project has its own extraction endpoint:

https://nuextract.ai/api/projects/{projectId}/extract

You provide it a document and it returns the extracted information according to the task defined in the project. To use it, you need:

  • To create an API key in the Account section
  • To replace {projectId} by the project ID found in the API tab of the project

You can test your extraction endpoint in your terminal using this command-line example with curl (make sure that you replace values of PROJECT_ID and NUEXTRACT_API_KEY):

NUEXTRACT_API_KEY=\"_your_api_key_here_\"; \\
PROJECT_ID=\"a24fd84a-44ab-4fd4-95a9-bebd46e4768b\"; \\
curl \"https://nuextract.ai/api/projects/${PROJECT_ID}/extract\" \\
  -X POST \\
  -H \"Authorization: Bearer ${NUEXTRACT_API_KEY}\" \\
  -H \"Content-Type: application/octet-stream\" \\
  --data-binary @\"${FILE_NAME}\"

You can also use the Python SDK, by replacing the project_id, api_key and file_path variables in the following code:

from numind import NuMind
from pathlib import Path

client = NuMind(api_key=api_key)
file_path = Path(\"path\", \"to\", \"document.odt\")
with file_path.open(\"rb\") as file:
    input_file = file.read()
output_schema = client.post_api_projects_projectid_extract(project_id, input_file)

Using the Platform via API

Everything you can do on the web platform can be done via API - check the user guide to learn about how the platform works. This can be useful to create projects automatically, or to make your production more robust for example.

Main resources

  • Project - user project, identified by projectId
  • File - uploaded file, identified by fileId, stored up to two weeks if not tied to an Example
  • Document - internal representation of a document, identified by documentId, created from a File or a text, stored up to two weeks if not tied to an Example
  • Example - document-extraction pair given to teach NuExtract, identified by exampleId, created from a Document

Most common API operations

  • Creating a Project via POST /api/projects
  • Changing the template of a Project via PATCH /api/projects/{projectId}
  • Uploading a file to a File via POST /api/files (up to 2 weeks storage)
  • Creating a Document via POST /api/documents/text and POST /api/files/{fileID}/convert-to-document from a text or a File
  • Adding an Example to a Project via POST /api/projects/{projectId}/examples
  • Changing Project settings via POST /api/projects/{projectId}/settings
  • Locking a Project via POST /api/projects/{projectId}/lock

This Python package is automatically generated by the OpenAPI Generator project:

  • API version:
  • Package version: 1.0.0
  • Generator version: 7.21.0
  • Build package: org.openapitools.codegen.languages.PythonClientCodegen

Documentation for API Endpoints

All URIs are relative to https://nuextract.ai

Class Method HTTP request Description
ContentExtractionApi get_api_content_extraction_jobs_contentextractionjobid GET /api/content-extraction/jobs/{contentExtractionJobId}
ContentExtractionApi post_api_content_extraction_jobs POST /api/content-extraction/jobs
ContentExtractionProjectManagementApi get_api_content_extraction GET /api/content-extraction
ContentExtractionProjectManagementApi patch_api_content_extraction_contentprojectid PATCH /api/content-extraction/{contentProjectId}
ContentExtractionProjectManagementApi patch_api_content_extraction_contentprojectid_settings PATCH /api/content-extraction/{contentProjectId}/settings
ContentExtractionProjectManagementApi post_api_content_extraction POST /api/content-extraction
ContentExtractionProjectManagementApi post_api_content_extraction_contentprojectid_reset_settings POST /api/content-extraction/{contentProjectId}/reset-settings
DefaultApi get_api_debug_status_code GET /api/debug/status/{code}
DefaultApi get_api_health GET /api/health
DefaultApi get_api_inference_status GET /api/inference-status
DefaultApi get_api_ping GET /api/ping
DefaultApi get_api_version GET /api/version
DocumentsApi get_api_documents_documentid GET /api/documents/{documentId}
DocumentsApi get_api_documents_documentid_content GET /api/documents/{documentId}/content
DocumentsApi post_api_documents_documentid_new_owner POST /api/documents/{documentId}/new-owner
DocumentsApi post_api_documents_text POST /api/documents/text
FilesApi get_api_files_fileid GET /api/files/{fileId}
FilesApi get_api_files_fileid_content GET /api/files/{fileId}/content
FilesApi post_api_files POST /api/files
FilesApi post_api_files_fileid_convert_to_document POST /api/files/{fileId}/convert-to-document
InferenceApi post_api_content_extraction_contentprojectid_jobs_document_documentid POST /api/content-extraction/{contentProjectId}/jobs/document/{documentId}
InferenceApi post_api_structured_extraction_structuredprojectid_jobs_document_documentid POST /api/structured-extraction/{structuredProjectId}/jobs/document/{documentId}
InferenceApi post_api_structured_extraction_structuredprojectid_jobs_text POST /api/structured-extraction/{structuredProjectId}/jobs/text
InferenceApi post_api_template_generation_jobs_document_documentid POST /api/template-generation/jobs/document/{documentId}
InferenceApi post_api_template_generation_jobs_text POST /api/template-generation/jobs/text
JobsApi get_api_jobs GET /api/jobs
JobsApi get_api_jobs_jobid_status GET /api/jobs/{jobId}/status
JobsApi get_api_jobs_jobid_stream GET /api/jobs/{jobId}/stream
StructuredDataExtractionApi get_api_structured_extraction_jobs_structuredextractionjobid GET /api/structured-extraction/jobs/{structuredExtractionJobId}
StructuredDataExtractionApi post_api_structured_extraction_structuredprojectid_jobs POST /api/structured-extraction/{structuredProjectId}/jobs
StructuredExtractionExamplesApi delete_api_structured_extraction_structuredprojectid_examples_structuredexampleid DELETE /api/structured-extraction/{structuredProjectId}/examples/{structuredExampleId}
StructuredExtractionExamplesApi get_api_structured_extraction_structuredprojectid_examples GET /api/structured-extraction/{structuredProjectId}/examples
StructuredExtractionExamplesApi get_api_structured_extraction_structuredprojectid_examples_structuredexampleid GET /api/structured-extraction/{structuredProjectId}/examples/{structuredExampleId}
StructuredExtractionExamplesApi post_api_structured_extraction_structuredprojectid_examples POST /api/structured-extraction/{structuredProjectId}/examples
StructuredExtractionExamplesApi put_api_structured_extraction_structuredprojectid_examples_structuredexampleid PUT /api/structured-extraction/{structuredProjectId}/examples/{structuredExampleId}
StructuredExtractionProjectManagementApi delete_api_structured_extraction_structuredprojectid DELETE /api/structured-extraction/{structuredProjectId}
StructuredExtractionProjectManagementApi get_api_structured_extraction GET /api/structured-extraction
StructuredExtractionProjectManagementApi get_api_structured_extraction_structuredprojectid GET /api/structured-extraction/{structuredProjectId}
StructuredExtractionProjectManagementApi patch_api_structured_extraction_structuredprojectid PATCH /api/structured-extraction/{structuredProjectId}
StructuredExtractionProjectManagementApi patch_api_structured_extraction_structuredprojectid_settings PATCH /api/structured-extraction/{structuredProjectId}/settings
StructuredExtractionProjectManagementApi post_api_structured_extraction POST /api/structured-extraction
StructuredExtractionProjectManagementApi post_api_structured_extraction_structuredprojectid_duplicate POST /api/structured-extraction/{structuredProjectId}/duplicate
StructuredExtractionProjectManagementApi post_api_structured_extraction_structuredprojectid_lock POST /api/structured-extraction/{structuredProjectId}/lock
StructuredExtractionProjectManagementApi post_api_structured_extraction_structuredprojectid_reset_settings POST /api/structured-extraction/{structuredProjectId}/reset-settings
StructuredExtractionProjectManagementApi post_api_structured_extraction_structuredprojectid_share POST /api/structured-extraction/{structuredProjectId}/share
StructuredExtractionProjectManagementApi post_api_structured_extraction_structuredprojectid_unlock POST /api/structured-extraction/{structuredProjectId}/unlock
StructuredExtractionProjectManagementApi post_api_structured_extraction_structuredprojectid_unshare POST /api/structured-extraction/{structuredProjectId}/unshare
TemplateGenerationApi get_api_template_generation_jobs_templatejobid GET /api/template-generation/jobs/{templateJobId}
TemplateGenerationApi post_api_template_generation_jobs POST /api/template-generation/jobs

Documentation For Models

Documentation For Authorization

Authentication schemes defined for the API:

oauth2Auth

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numind-0.2.2.tar.gz (463.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

numind-0.2.2-py3-none-any.whl (294.7 kB view details)

Uploaded Python 3

File details

Details for the file numind-0.2.2.tar.gz.

File metadata

  • Download URL: numind-0.2.2.tar.gz
  • Upload date:
  • Size: 463.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numind-0.2.2.tar.gz
Algorithm Hash digest
SHA256 eeab23445690bac11fea177e0c653e654981bca7355c27e4fee447327c37f5d9
MD5 f3a888631b854b3075226a31d3ab287f
BLAKE2b-256 c602a387e88c820c259514b6a9016031d43bfbef52669743ef2e08916f160030

See more details on using hashes here.

Provenance

The following attestation bundles were made for numind-0.2.2.tar.gz:

Publisher: publish-pypi.yml on numindai/nuextract-platform-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numind-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: numind-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 294.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for numind-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 df99962e99061e43273d7260471ce7fd97d5d4d4f924875357ca07b0febabf1a
MD5 29d2404797fe807a8578ee204ebe311f
BLAKE2b-256 bdc969cc8cf954984bdb71b317dbdf184eba5afd19f4badc23719ad9bee82af9

See more details on using hashes here.

Provenance

The following attestation bundles were made for numind-0.2.2-py3-none-any.whl:

Publisher: publish-pypi.yml on numindai/nuextract-platform-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page