SDK to interact with the NuMind models API.
Project description
NuMind SDK
Python SDK to interact with NuMind's models API: NuExtract and NuMarkdown.
Installation
pip install numind
Usage and code examples
Create a client
You must first get an API key on the NuExtract platform.
import os
from numind import NuMind
# Create a client object to interact with the API
# Providing the `api_key` is not required if the `NUMIND_API_KEY` environment variable
# is already set.
client = NuMind(api_key=os.environ["NUMIND_API_KEY"])
Create an async client
You can create an async client by using the NuMindAsync class:
import asyncio
from numind import NuMindAsync
client = NuMindAsync(api_key="API_KEY")
requests = [{}]
async def main():
return [
await client.extract_structured_data(project_id, **request_kwargs)
for request_kwargs in requests
]
responses = asyncio.run(main())
The methods and their usages are the same as for the sync NuMind client except that API methods are coroutines that must be awaited.
NuExtract: Extract structured information "on the fly"
If you want to extract structured information from data without projects but just by providing the input template, you can use the extract method which provides a more user-friendly way to interact with the API:
template = {
"destination": {
"name": "verbatim-string",
"zip_code": "string",
"country": "string"
},
"accommodation": "verbatim-string",
"activities": ["verbatim-string"],
"duration": {
"time_unit": ["day", "week", "month", "year"],
"time_quantity": "integer"
}
}
input_text = """My dream vacation would be a month-long escape to the stunning islands of Tahiti.
I’d stay in an overwater bungalow in Bora Bora, waking up to crystal-clear turquoise waters and breathtaking sunrises.
Days would be spent snorkeling with vibrant marine life, paddleboarding over coral gardens, and basking on pristine white-sand beaches.
I’d explore lush rainforests, hidden waterfalls, and the rich Polynesian culture through traditional dance, music, and cuisine.
Evenings would be filled with romantic beachside dinners under the stars, with the soothing sound of waves as the perfect backdrop."""
output = client.extract_structured_data(template=template, input_text=input_text)
print(output)
# Can also work with files, replace the path with your own
# from pathlib import Path
# output = client.extract(template=template, input_file="file.ppt")
{
"destination": {
"name": "Tahiti",
"zip_code": "98730",
"country": "France"
},
"accommodation": "overwater bungalow in Bora Bora",
"activities": [
"snorkeling",
"paddleboarding",
"basking",
"explore lush rainforests, hidden waterfalls, and the rich Polynesian culture"
],
"duration": {
"time_unit": null,
"time_quantity": null
}
}
Create a good template
NuExtract uses JSON schemas as extraction templates which specify the information to retrieve and their types, which are:
- string: a text, whose value can be abstract, i.e. totally free and can be deduced from calculations, reasoning, external knowledge;
- verbatim-string: a purely extractive text whose value must be present in the document. Some flexibility might be allowed on the formatting, e.g. new lines and escaped characters (e.g.
\n) in a documents might be represented with a space; - integer: an integer number;
- number: any number, that may be a floating point number or an integer;
- boolean: a boolean whose value should be either true or false;
- date-time: a date or time whose value should follow the ISO 8601 standard (
YYYY-MM-DDThh:mm:ss). It may feature "reduced" accuracy, i.e. omitting certain date or time components not useful in specific cases. For examples, if the extracted value is a date,YYYY-MM-DDis a valid value format. The same applies to times with thehh:mm:ssformat (without omitting the leadingTsymbol). Additionally, the "least significant" component might be omitted if it is not required or specified. For example, a specific month and year can be specified asYYYY-MMwhile omitting the day componentDD. A specific hour can be specified ashhwhile omitting the minutes and seconds components. When combining dates and time, only the least significant time components can be omitted, e.g.YYYY-MM-DDThh:mmwhich is omitting the seconds.
Additionally, the value of a field can be:
- a nested dictionary, i.e. another branch, describing elements associated to their parent node (key);
- an array of items of the form
["type"], whose values are elements of a given "type", which can also be a dictionary of unspecified depth; - an enum, i.e. a list of elements to choose from of the form
["choice1", "choice2", ...]. For values of this type, just set the value of the item to choose, e.g. "choice1", and do not set the value as an array containing the item such as["choice1"]; - a multi-enum, i.e. a list from which multiple elements can be picked, of the form
[["choice1", "choice2", ...]](double square brackets).
Inferring a template
The "infer_template" method allows to quickly create a template that you can start to work with from a text description.
from numind.openapi_client import TemplateRequest
from pydantic import StrictStr
description = "Create a template that extracts key information from an order confirmation email. The template should be able to pull details like the order ID, customer ID, date and time of the order, status, total amount, currency, item details (product ID, quantity, and unit price), shipping address, any customer requests or delivery preferences, and the estimated delivery date."
input_schema = client.post_api_infer_template(
template_request=TemplateRequest(description=StrictStr(description))
)
Create a project
A project allows to define an information extraction task from a template and examples.
from numind.openapi_client import CreateProjectRequest
project_id = client.post_api_structured_extraction(
CreateProjectRequest(
name="vacation",
description="Extraction of locations and activities",
template=template,
)
)
The project_id can also be found in the "API" tab of a project on the NuExtract website.
Add examples to a project to teach NuExtract via ICL (In-Context Learning)
from pathlib import Path
# Prepare examples, here a text and a file
example_1_input = "This is a text example"
example_1_expected_output = {
"destination": {"name": None, "zip_code": None, "country": None}
}
with Path("example_2.odt").open("rb") as file: # read bytes
example_2_input = file.read()
example_2_expected_output = {
"destination": {"name": None, "zip_code": None, "country": None}
}
examples = [
(example_1_input, example_1_expected_output),
(example_2_input, example_2_expected_output),
]
# Add the examples to the project
client.add_examples_to_structured_extraction_project(project_id, examples)
Extract structured information from text
output_schema = client.extract_structured_data(project_id, input_text=input_text)
Extract structured information from a file
from pathlib import Path
file_path = Path("document.odt")
with file_path.open("rb") as file:
input_file = file.read()
output_schema = client.extract(project_id, input_file=input_file)
NuMarkdown: Convert a document to a RAG-ready Markdown
from pathlib import Path
file_path = Path("document.pdf")
with file_path.open("rb") as file:
input_file = file.read()
markdown = client.extract_content(input_file)
Documentation
Extracting Information from Documents
Once your project is ready, you can use it to extract information from documents in real time via this RESTful API.
Each project has its own extraction endpoint:
https://nuextract.ai/api/projects/{projectId}/extract
You provide it a document and it returns the extracted information according to the task defined in the project. To use it, you need:
- To create an API key in the Account section
- To replace
{projectId}by the project ID found in the API tab of the project
You can test your extraction endpoint in your terminal using this command-line example with curl (make sure that you replace values of PROJECT_ID and NUEXTRACT_API_KEY):
NUEXTRACT_API_KEY=\"_your_api_key_here_\"; \\
PROJECT_ID=\"a24fd84a-44ab-4fd4-95a9-bebd46e4768b\"; \\
curl \"https://nuextract.ai/api/projects/${PROJECT_ID}/extract\" \\
-X POST \\
-H \"Authorization: Bearer ${NUEXTRACT_API_KEY}\" \\
-H \"Content-Type: application/octet-stream\" \\
--data-binary @\"${FILE_NAME}\"
You can also use the Python SDK, by replacing the
project_id, api_key and file_path variables in the following code:
from numind import NuMind
from pathlib import Path
client = NuMind(api_key=api_key)
file_path = Path(\"path\", \"to\", \"document.odt\")
with file_path.open(\"rb\") as file:
input_file = file.read()
output_schema = client.post_api_projects_projectid_extract(project_id, input_file)
Using the Platform via API
Everything you can do on the web platform can be done via API - check the user guide to learn about how the platform works. This can be useful to create projects automatically, or to make your production more robust for example.
Main resources
- Project - user project, identified by
projectId - File - uploaded file, identified by
fileId, stored up to two weeks if not tied to an Example - Document - internal representation of a document, identified by
documentId, created from a File or a text, stored up to two weeks if not tied to an Example - Example - document-extraction pair given to teach NuExtract, identified by
exampleId, created from a Document
Most common API operations
- Creating a Project via
POST /api/projects - Changing the template of a Project via
PATCH /api/projects/{projectId} - Uploading a file to a File via
POST /api/files(up to 2 weeks storage) - Creating a Document via
POST /api/documents/textandPOST /api/files/{fileID}/convert-to-documentfrom a text or a File - Adding an Example to a Project via
POST /api/projects/{projectId}/examples - Changing Project settings via
POST /api/projects/{projectId}/settings - Locking a Project via
POST /api/projects/{projectId}/lock
This Python package is automatically generated by the OpenAPI Generator project:
- API version:
- Package version: 1.0.0
- Generator version: 7.21.0
- Build package: org.openapitools.codegen.languages.PythonClientCodegen
Documentation for API Endpoints
All URIs are relative to https://nuextract.ai
| Class | Method | HTTP request | Description |
|---|---|---|---|
| ContentExtractionApi | get_api_content_extraction_jobs_contentextractionjobid | GET /api/content-extraction/jobs/{contentExtractionJobId} | |
| ContentExtractionApi | post_api_content_extraction_jobs | POST /api/content-extraction/jobs | |
| ContentExtractionProjectManagementApi | get_api_content_extraction | GET /api/content-extraction | |
| ContentExtractionProjectManagementApi | patch_api_content_extraction_contentprojectid | PATCH /api/content-extraction/{contentProjectId} | |
| ContentExtractionProjectManagementApi | patch_api_content_extraction_contentprojectid_settings | PATCH /api/content-extraction/{contentProjectId}/settings | |
| ContentExtractionProjectManagementApi | post_api_content_extraction | POST /api/content-extraction | |
| ContentExtractionProjectManagementApi | post_api_content_extraction_contentprojectid_reset_settings | POST /api/content-extraction/{contentProjectId}/reset-settings | |
| DefaultApi | get_api_debug_status_code | GET /api/debug/status/{code} | |
| DefaultApi | get_api_health | GET /api/health | |
| DefaultApi | get_api_inference_status | GET /api/inference-status | |
| DefaultApi | get_api_ping | GET /api/ping | |
| DefaultApi | get_api_version | GET /api/version | |
| DocumentsApi | get_api_documents_documentid | GET /api/documents/{documentId} | |
| DocumentsApi | get_api_documents_documentid_content | GET /api/documents/{documentId}/content | |
| DocumentsApi | post_api_documents_documentid_new_owner | POST /api/documents/{documentId}/new-owner | |
| DocumentsApi | post_api_documents_text | POST /api/documents/text | |
| FilesApi | get_api_files_fileid | GET /api/files/{fileId} | |
| FilesApi | get_api_files_fileid_content | GET /api/files/{fileId}/content | |
| FilesApi | post_api_files | POST /api/files | |
| FilesApi | post_api_files_fileid_convert_to_document | POST /api/files/{fileId}/convert-to-document | |
| InferenceApi | post_api_content_extraction_contentprojectid_jobs_document_documentid | POST /api/content-extraction/{contentProjectId}/jobs/document/{documentId} | |
| InferenceApi | post_api_structured_extraction_structuredprojectid_jobs_document_documentid | POST /api/structured-extraction/{structuredProjectId}/jobs/document/{documentId} | |
| InferenceApi | post_api_structured_extraction_structuredprojectid_jobs_text | POST /api/structured-extraction/{structuredProjectId}/jobs/text | |
| InferenceApi | post_api_template_generation_jobs_document_documentid | POST /api/template-generation/jobs/document/{documentId} | |
| InferenceApi | post_api_template_generation_jobs_text | POST /api/template-generation/jobs/text | |
| JobsApi | get_api_jobs | GET /api/jobs | |
| JobsApi | get_api_jobs_jobid_status | GET /api/jobs/{jobId}/status | |
| JobsApi | get_api_jobs_jobid_stream | GET /api/jobs/{jobId}/stream | |
| StructuredDataExtractionApi | get_api_structured_extraction_jobs_structuredextractionjobid | GET /api/structured-extraction/jobs/{structuredExtractionJobId} | |
| StructuredDataExtractionApi | post_api_structured_extraction_structuredprojectid_jobs | POST /api/structured-extraction/{structuredProjectId}/jobs | |
| StructuredExtractionExamplesApi | delete_api_structured_extraction_structuredprojectid_examples_structuredexampleid | DELETE /api/structured-extraction/{structuredProjectId}/examples/{structuredExampleId} | |
| StructuredExtractionExamplesApi | get_api_structured_extraction_structuredprojectid_examples | GET /api/structured-extraction/{structuredProjectId}/examples | |
| StructuredExtractionExamplesApi | get_api_structured_extraction_structuredprojectid_examples_structuredexampleid | GET /api/structured-extraction/{structuredProjectId}/examples/{structuredExampleId} | |
| StructuredExtractionExamplesApi | post_api_structured_extraction_structuredprojectid_examples | POST /api/structured-extraction/{structuredProjectId}/examples | |
| StructuredExtractionExamplesApi | put_api_structured_extraction_structuredprojectid_examples_structuredexampleid | PUT /api/structured-extraction/{structuredProjectId}/examples/{structuredExampleId} | |
| StructuredExtractionProjectManagementApi | delete_api_structured_extraction_structuredprojectid | DELETE /api/structured-extraction/{structuredProjectId} | |
| StructuredExtractionProjectManagementApi | get_api_structured_extraction | GET /api/structured-extraction | |
| StructuredExtractionProjectManagementApi | get_api_structured_extraction_structuredprojectid | GET /api/structured-extraction/{structuredProjectId} | |
| StructuredExtractionProjectManagementApi | patch_api_structured_extraction_structuredprojectid | PATCH /api/structured-extraction/{structuredProjectId} | |
| StructuredExtractionProjectManagementApi | patch_api_structured_extraction_structuredprojectid_settings | PATCH /api/structured-extraction/{structuredProjectId}/settings | |
| StructuredExtractionProjectManagementApi | post_api_structured_extraction | POST /api/structured-extraction | |
| StructuredExtractionProjectManagementApi | post_api_structured_extraction_structuredprojectid_duplicate | POST /api/structured-extraction/{structuredProjectId}/duplicate | |
| StructuredExtractionProjectManagementApi | post_api_structured_extraction_structuredprojectid_lock | POST /api/structured-extraction/{structuredProjectId}/lock | |
| StructuredExtractionProjectManagementApi | post_api_structured_extraction_structuredprojectid_reset_settings | POST /api/structured-extraction/{structuredProjectId}/reset-settings | |
| StructuredExtractionProjectManagementApi | post_api_structured_extraction_structuredprojectid_share | POST /api/structured-extraction/{structuredProjectId}/share | |
| StructuredExtractionProjectManagementApi | post_api_structured_extraction_structuredprojectid_unlock | POST /api/structured-extraction/{structuredProjectId}/unlock | |
| StructuredExtractionProjectManagementApi | post_api_structured_extraction_structuredprojectid_unshare | POST /api/structured-extraction/{structuredProjectId}/unshare | |
| TemplateGenerationApi | get_api_template_generation_jobs_templatejobid | GET /api/template-generation/jobs/{templateJobId} | |
| TemplateGenerationApi | post_api_template_generation_jobs | POST /api/template-generation/jobs |
Documentation For Models
- ContentExtractionResponse
- ContentProjectResponse
- ContentProjectSettingsResponse
- ConvertRequest
- CreateContentProjectRequest
- CreateOrUpdateStructuredExampleRequest
- CreateStructuredProjectRequest
- DocumentInfo
- DocumentResponse
- Error
- FileResponse
- HealthResponse
- ImageInfo
- InferenceStatus
- InferenceValidationError
- InformationResponse
- InvalidInformation
- JobIdResponse
- JobResponse
- JobStatusResponse
- PaginatedResponseJobResponse
- PaginatedResponseStructuredExampleResponse
- ServiceStatus
- StructuredExampleResponse
- StructuredExtractionResponse
- StructuredInferenceExample
- StructuredProjectResponse
- StructuredProjectSettingsResponse
- TemplateRequest
- TemplateResponse
- TextInfo
- TextRequest
- UpdateContentProjectRequest
- UpdateContentProjectSettingsRequest
- UpdateStructuredProjectRequest
- UpdateStructuredProjectSettingsRequest
- ValidInformation
- VersionResponse
Documentation For Authorization
Authentication schemes defined for the API:
oauth2Auth
- Type: OAuth
- Flow: accessCode
- Authorization URL: https://users.numind.ai/realms/extract-platform/protocol/openid-connect/auth
- Scopes:
- openid: OpenID connect
- profile: view profile
- email: view email
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file numind-0.2.2.tar.gz.
File metadata
- Download URL: numind-0.2.2.tar.gz
- Upload date:
- Size: 463.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eeab23445690bac11fea177e0c653e654981bca7355c27e4fee447327c37f5d9
|
|
| MD5 |
f3a888631b854b3075226a31d3ab287f
|
|
| BLAKE2b-256 |
c602a387e88c820c259514b6a9016031d43bfbef52669743ef2e08916f160030
|
Provenance
The following attestation bundles were made for numind-0.2.2.tar.gz:
Publisher:
publish-pypi.yml on numindai/nuextract-platform-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
numind-0.2.2.tar.gz -
Subject digest:
eeab23445690bac11fea177e0c653e654981bca7355c27e4fee447327c37f5d9 - Sigstore transparency entry: 1180172484
- Sigstore integration time:
-
Permalink:
numindai/nuextract-platform-sdk@7b136adb34ea9f8dac3313fea4e21f7631cabf0f -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/numindai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@7b136adb34ea9f8dac3313fea4e21f7631cabf0f -
Trigger Event:
release
-
Statement type:
File details
Details for the file numind-0.2.2-py3-none-any.whl.
File metadata
- Download URL: numind-0.2.2-py3-none-any.whl
- Upload date:
- Size: 294.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df99962e99061e43273d7260471ce7fd97d5d4d4f924875357ca07b0febabf1a
|
|
| MD5 |
29d2404797fe807a8578ee204ebe311f
|
|
| BLAKE2b-256 |
bdc969cc8cf954984bdb71b317dbdf184eba5afd19f4badc23719ad9bee82af9
|
Provenance
The following attestation bundles were made for numind-0.2.2-py3-none-any.whl:
Publisher:
publish-pypi.yml on numindai/nuextract-platform-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
numind-0.2.2-py3-none-any.whl -
Subject digest:
df99962e99061e43273d7260471ce7fd97d5d4d4f924875357ca07b0febabf1a - Sigstore transparency entry: 1180172504
- Sigstore integration time:
-
Permalink:
numindai/nuextract-platform-sdk@7b136adb34ea9f8dac3313fea4e21f7631cabf0f -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/numindai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@7b136adb34ea9f8dac3313fea4e21f7631cabf0f -
Trigger Event:
release
-
Statement type: