Skip to main content

Python SDK for Blazegraph — parse PDFs into typed semantic document graphs

Project description

blazegraph-io

Python SDK for Blazegraph — parse PDFs into typed semantic document graphs.

Install

pip install blazegraph-io

Python 3.9+. Only dependency: httpx.

Quick Start

import blazegraphio as bg

# Local mode — no account needed, runs on your machine
graph = bg.parse_pdf("document.pdf")

print(f"{len(graph.nodes)} nodes, {len(graph.sections)} sections")

for section in graph.sections:
    print(section.content.text)
    print(section.location.semantic.breadcrumbs)
    print(f"  Page {section.location.physical.page}")

On first run, the SDK downloads the blazegraph-cli binary and a JRE automatically. Subsequent runs are instant.

API Mode

Switch to the hosted API with one line:

bg.configure(api_key="blaze_prod_...")
graph = bg.parse_pdf("document.pdf")  # same code, cloud processing

Async is also supported:

graph = await bg.parse_pdf_async("document.pdf")

What You Get

Every node has typed fields with IDE autocomplete:

node = graph.sections[0]

node.content.text                        # "Introduction"
node.token_count                         # 206
node.location.semantic.path              # "2.3"
node.location.semantic.breadcrumbs       # ["paper.pdf", "Introduction"]
node.location.physical.page              # 1
node.location.physical.bounding_box      # BoundingBox(x=91.9, y=585.9, ...)

Navigate the tree:

parent = node.get_parent(graph)
children = node.get_children(graph)

Render text:

print(section.render(graph, breadcrumbs=True))
# [paper.pdf > Introduction]
# The fundamental problem of communication...

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blazegraph_io-0.1.2.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blazegraph_io-0.1.2-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file blazegraph_io-0.1.2.tar.gz.

File metadata

  • Download URL: blazegraph_io-0.1.2.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for blazegraph_io-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d34299723d443bbd6e7c086d5c59a03c5d0893cedd6ab8aa48678eedea39dc3d
MD5 48c79dace591466841a034db8a91afdf
BLAKE2b-256 71ed17bfcb0927a0c4af1d4148791c8661a6971b3df136399eda79dff512f1fa

See more details on using hashes here.

File details

Details for the file blazegraph_io-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: blazegraph_io-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for blazegraph_io-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ee6e62f5eb84202060e6203f2f44eaf4a226d6733a3d65c8b104af543a6010c4
MD5 952e11f2531de22eda8d476a48d13073
BLAKE2b-256 143fb71b410aa2aff93fd303ff867ed7fafd1260608978ad6d6732021060d7a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page