Novalad: AI-powered platform for transforming unstructured documents like PDFs and PowerPoints into machine-readable, structured data.
Project description
Novalad is an AI-powered platform that transforms chaotic, unstructured files—such as PDFs and PowerPoints—into beautifully organized, machine-readable data 💡. Designed for developers, data teams, and enterprises, Novalad efficiently handles complex layouts, tables, graphs, and multi-format data using a multi-model, map-reduce approach 🧩.
View Novalad Extraction Result
Table of Contents
Installation 🚀
Install the Novalad package using pip:
pip install novalad
Usage 📚
-
Generate API Key:
Log in to Novalad (https://app.novalad.ai) and generate your API key. Copy the key and keep it handy. -
Importing and Initializing the Client
Begin by importingNovaladClientfrom the package and initializing it with your API key: You can setNOVALAD_API_KEYin env variable or pass it to Clientfrom novalad import NovaladClient # Initialize client with your API key client = NovaladClient(api_key="YOUR_API_KEY") # or set env NOVALAD_API_KEY
Uploading a File from Your Local System
If you have a file stored locally (e.g., a PDF document), specify its file path and use the upload method to send the file for processing.
Note: Only run this code if you are processing a local file. If your file is hosted online (via URL or cloud storage), skip this step.
# Define the path to your document
path = r"C:\path\to\your\document.pdf"
# Upload the file
client.upload(file_path=path)
After uploading your file, trigger the processing job using the run method:
# Start processing the uploaded file
client.run()
OR
Processing a Document Directly from a URL
If your document is hosted online (such as in cloud storage or via a public URL), you can process it directly by passing its URL to the run method. This approach avoids the local upload step.
# Process document directly by passing the file URL
client.run(
url="https://d2uars7xkdmztq.cloudfront.net/app_resources/8049/documentation/91320_en.pdf"
)
Supported URL Types:
- HTTPS URLs
- AWS S3 pre-signed URLs
- GCP Storage Signed URLs
- Azure Blob HTTPS public URLs
Checking Job Status
Monitor the status of your processing job by calling the status method. The job continues until the status is either "success" or "failed":
import time
while True:
status = client.status()
if status["status"] in ["success", "failed"]:
break
time.sleep(60) # Check every 30 seconds
print(".", end="")
print("\n", status)
Retrieving and Rendering Outputs
After the job is complete, you can retrieve and render the results in various formats:
| Format | Description |
|---|---|
| JSON 🧾 | Raw layout and structured element data (ideal for developers) |
| Markdown 📘 | Clean, human-readable content for documentation and wikis |
| Knowledge Graph 🕸️ | Visual representation of semantic relations and entities |
| LangChain Docs 🔗 | Plug-and-play format optimized for LLM pipelines |
JSON Output
Retrieve the raw JSON response containing structured data, metadata, and extracted text:
json_response = client.output(format="json")
print(json_response)
Markdown Output
Get a Markdown version of the output and render it using the render_markdown helper:
markdown_output = client.output(format="markdown")
print(markdown_output)
LangChain Document Format Output
Retrieve the output as a structured document object for further processing:
documents = client.output(format="document")
print(documents)
Knowledge Graph Output
Retrieve the relationships and entities within the document as a knowledge graph:
kg_output = client.output(format="graph")
print(kg_output)
Rendering the Outputs (NOTEBOOK ONLY!!!)
IF YOU ARE USING JUPYTER NOTEBOOK/COLLAB/KAGGLE, YOU CAN RENDER OR VIEW THE OUTPUT FORMATS DIRECTLY IN YOUR NOTEBOOK CELLS
Render JSON Output:
This code renders images displaying the PDF document page-wise with elements and layouts highlighted.
Note: You can also save the rendered images to a local directory by passing save_dir=r"C:\path\to\save\visualization" to the render_elements function.
from novalad import render_elements
render_elements(path, json_response)
# To save images locally:
# render_elements(path, json_response, save_dir=r"C:\path\to\save\visualization")
Render Markdown Output:
from novalad import render_markdown
render_markdown(markdown_output)
Render Knowledge Graph:
from novalad import render_knowledge_graph
render_knowledge_graph(kg_output)
Troubleshooting 🛠️
- Job Failure: Verify that your API key is correct and the file path is accessible. Review the status output for error messages.
- File Path Issues: Ensure the file path is correctly formatted (use raw strings for Windows paths).
- URL Issues: Confirm that the document URL is correct and publicly accessible.
- API Key Problems: Verify that your API key is active and valid. If authentication issues persist, please contact support.
for any issue please mail us at info@novalad.ai
License 📄
This project is licensed under the Apache License.
Support 🙋♂️🙋♀️
For additional help or to report issues, please refer to the official documentation or contact support at info@novalad.ai
Thank you for choosing Novalad! 🚀
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file novalad-0.1.16.tar.gz.
File metadata
- Download URL: novalad-0.1.16.tar.gz
- Upload date:
- Size: 19.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.12.8 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48e876a00c4dca5387580319a79c3688ba1890bebda93dbd934e091d55dc3e7f
|
|
| MD5 |
b3343f2e57e5f136f50e7d5d50d97031
|
|
| BLAKE2b-256 |
660c292b939f24fec72d668b7cd03d1feaf4252beaa3e4abd8adf29d8e88afeb
|
File details
Details for the file novalad-0.1.16-py3-none-any.whl.
File metadata
- Download URL: novalad-0.1.16-py3-none-any.whl
- Upload date:
- Size: 20.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.12.8 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29d2337ab343594d650fc9b9797a4ba710dc0e8e29010451aba1b44b8c10537e
|
|
| MD5 |
1f2e5d471281640a48a8ff6b9c99cdb1
|
|
| BLAKE2b-256 |
aaf467516c8cd6f146e6f47890866d3154d8236911b82654ca7a4edaca4d239d
|