Skip to main content

Novalad: AI-powered platform for transforming unstructured documents like PDFs and PowerPoints into machine-readable, structured data.

Project description

Novalad Logo

Novalad is an AI-powered platform that transforms chaotic, unstructured files—such as PDFs and PowerPoints—into beautifully organized, machine-readable data 💡. Designed for developers, data teams, and enterprises, Novalad efficiently handles complex layouts, tables, graphs, and multi-format data using a multi-model, map-reduce approach 🧩.

View Novalad Extraction Result


Google Colab PyPI version Python Version GitHub Website Docs API Docs YouTube License Apache

Table of Contents


Installation 🚀

Install the Novalad package using pip:

pip install novalad

Usage 📚

  1. Generate API Key:
    Log in to Novalad (https://app.novalad.ai) and generate your API key. Copy the key and keep it handy.

  2. Importing and Initializing the Client
    Begin by importing NovaladClient from the package and initializing it with your API key: You can set NOVALAD_API_KEY in env variable or pass it to Client

    from novalad import NovaladClient
    
    # Initialize client with your API key
    client = NovaladClient(api_key="YOUR_API_KEY") # or set env NOVALAD_API_KEY 
    

Uploading a File from Your Local System

If you have a file stored locally (e.g., a PDF document), specify its file path and use the upload method to send the file for processing.
Note: Only run this code if you are processing a local file. If your file is hosted online (via URL or cloud storage), skip this step.

# Define the path to your document
path = r"C:\path\to\your\document.pdf"

# Upload the file
client.upload(file_path=path)

After uploading your file, trigger the processing job using the run method:

# Start processing the uploaded file
client.run()

OR

Processing a Document Directly from a URL

If your document is hosted online (such as in cloud storage or via a public URL), you can process it directly by passing its URL to the run method. This approach avoids the local upload step.

# Process document directly by passing the file URL
client.run(
    url="https://d2uars7xkdmztq.cloudfront.net/app_resources/8049/documentation/91320_en.pdf"
)

Supported URL Types:

  • HTTPS URLs
  • AWS S3 pre-signed URLs
  • GCP Storage Signed URLs
  • Azure Blob HTTPS public URLs

Checking Job Status

Monitor the status of your processing job by calling the status method. The job continues until the status is either "success" or "failed":

import time

while True:
    status = client.status()
    if status["status"] in ["success", "failed"]:
        break
    time.sleep(60)  # Check every 30 seconds
    print(".", end="")
print("\n", status)

Retrieving and Rendering Outputs

After the job is complete, you can retrieve and render the results in various formats:

Format Description
JSON 🧾 Raw layout and structured element data (ideal for developers)
Markdown 📘 Clean, human-readable content for documentation and wikis
Knowledge Graph 🕸️ Visual representation of semantic relations and entities
LangChain Docs 🔗 Plug-and-play format optimized for LLM pipelines

JSON Output

Retrieve the raw JSON response containing structured data, metadata, and extracted text:

json_response = client.output(format="json")
print(json_response)

Markdown Output

Get a Markdown version of the output and render it using the render_markdown helper:

markdown_output = client.output(format="markdown")
print(markdown_output)

LangChain Document Format Output

Retrieve the output as a structured document object for further processing:

documents = client.output(format="document")
print(documents)

Knowledge Graph Output

Retrieve the relationships and entities within the document as a knowledge graph:

kg_output = client.output(format="graph")
print(kg_output)

Rendering the Outputs (NOTEBOOK ONLY!!!)

IF YOU ARE USING JUPYTER NOTEBOOK/COLLAB/KAGGLE, YOU CAN RENDER OR VIEW THE OUTPUT FORMATS DIRECTLY IN YOUR NOTEBOOK CELLS

Render JSON Output:
This code renders images displaying the PDF document page-wise with elements and layouts highlighted.
Note: You can also save the rendered images to a local directory by passing save_dir=r"C:\path\to\save\visualization" to the render_elements function.

from novalad import render_elements

render_elements(path, json_response)
# To save images locally:
# render_elements(path, json_response, save_dir=r"C:\path\to\save\visualization")

Knowledge Graph

Render Markdown Output:

from novalad import render_markdown

render_markdown(markdown_output)

Knowledge Graph

Render Knowledge Graph:

from novalad import render_knowledge_graph

render_knowledge_graph(kg_output)

Knowledge Graph


Troubleshooting 🛠️

  • Job Failure: Verify that your API key is correct and the file path is accessible. Review the status output for error messages.
  • File Path Issues: Ensure the file path is correctly formatted (use raw strings for Windows paths).
  • URL Issues: Confirm that the document URL is correct and publicly accessible.
  • API Key Problems: Verify that your API key is active and valid. If authentication issues persist, please contact support.

for any issue please mail us at info@novalad.ai


License 📄

This project is licensed under the Apache License.


Support 🙋‍♂️🙋‍♀️

For additional help or to report issues, please refer to the official documentation or contact support at info@novalad.ai


Thank you for choosing Novalad! 🚀

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

novalad-0.1.16.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

novalad-0.1.16-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file novalad-0.1.16.tar.gz.

File metadata

  • Download URL: novalad-0.1.16.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.12.8 Linux/6.11.0-1018-azure

File hashes

Hashes for novalad-0.1.16.tar.gz
Algorithm Hash digest
SHA256 48e876a00c4dca5387580319a79c3688ba1890bebda93dbd934e091d55dc3e7f
MD5 b3343f2e57e5f136f50e7d5d50d97031
BLAKE2b-256 660c292b939f24fec72d668b7cd03d1feaf4252beaa3e4abd8adf29d8e88afeb

See more details on using hashes here.

File details

Details for the file novalad-0.1.16-py3-none-any.whl.

File metadata

  • Download URL: novalad-0.1.16-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.12.8 Linux/6.11.0-1018-azure

File hashes

Hashes for novalad-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 29d2337ab343594d650fc9b9797a4ba710dc0e8e29010451aba1b44b8c10537e
MD5 1f2e5d471281640a48a8ff6b9c99cdb1
BLAKE2b-256 aaf467516c8cd6f146e6f47890866d3154d8236911b82654ca7a4edaca4d239d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page