Skip to main content

Python SDK for Blazegraph — parse PDFs into typed semantic document graphs

Project description

blazegraph-io

Python SDK for Blazegraph — parse PDFs into typed semantic document graphs.

Install

pip install blazegraph-io

Python 3.9+. Only dependency: httpx.

Quick Start

import blazegraphio as bg

# Local mode — no account needed, runs on your machine
graph = bg.parse_pdf("document.pdf")

print(f"{len(graph.nodes)} nodes, {len(graph.sections)} sections")

for section in graph.sections:
    print(section.content.text)
    print(section.location.semantic.breadcrumbs)
    print(f"  Page {section.location.physical.page}")

On first run, the SDK downloads the blazegraph-cli binary and a JRE automatically. Subsequent runs are instant.

API Mode

Switch to the hosted API with one line:

bg.configure(api_key="blaze_prod_...")
graph = bg.parse_pdf("document.pdf")  # same code, cloud processing

Async is also supported:

graph = await bg.parse_pdf_async("document.pdf")

What You Get

Every node has typed fields with IDE autocomplete:

node = graph.sections[0]

node.content.text                        # "Introduction"
node.token_count                         # 206
node.location.semantic.path              # "2.3"
node.location.semantic.breadcrumbs       # ["paper.pdf", "Introduction"]
node.location.physical.page              # 1
node.location.physical.bounding_box      # BoundingBox(x=91.9, y=585.9, ...)

Navigate the tree:

parent = node.get_parent(graph)
children = node.get_children(graph)

Render text:

print(section.render(graph, breadcrumbs=True))
# [paper.pdf > Introduction]
# The fundamental problem of communication...

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blazegraph_io-0.2.0.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blazegraph_io-0.2.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file blazegraph_io-0.2.0.tar.gz.

File metadata

  • Download URL: blazegraph_io-0.2.0.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for blazegraph_io-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6060a5668a4062c6035a0dbc249bf40b0c08347db5b24ed89ae6cd05a1e47925
MD5 e5402526365b35598dcbfc532a9fe5cb
BLAKE2b-256 91c36ed41af9a3dfe0dae95d071ecc5f91bdbc8ddffa2a0ae501fed37140c12a

See more details on using hashes here.

File details

Details for the file blazegraph_io-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: blazegraph_io-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for blazegraph_io-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1c5647f0a87b9f6af89432de9ca7ad3126a39f384c9175a4e1ebe5d2f71a0a84
MD5 61c4263c3b31493481df75b37fae51ac
BLAKE2b-256 1d6e9701f2aa3d7ba191da0111738bdadf3da95e1ffe5aeb890f6b29be71acfb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page