Skip to main content

Python SDK for Blazegraph — parse PDFs into typed semantic document graphs

Project description

blazegraph-io

Python SDK for Blazegraph — parse PDFs into typed semantic document graphs.

Install

pip install blazegraph-io

Python 3.9+. Only dependency: httpx.

Quick Start

import blazegraphio as bg

# Local mode — no account needed, runs on your machine
graph = bg.parse_pdf("document.pdf")

print(f"{len(graph.nodes)} nodes, {len(graph.sections)} sections")

for section in graph.sections:
    print(section.content.text)
    print(section.location.semantic.breadcrumbs)
    print(f"  Page {section.location.physical.page}")

On first run, the SDK downloads the blazegraph-cli binary and a JRE automatically. Subsequent runs are instant.

API Mode

Switch to the hosted API with one line:

bg.configure(api_key="blaze_prod_...")
graph = bg.parse_pdf("document.pdf")  # same code, cloud processing

Async is also supported:

graph = await bg.parse_pdf_async("document.pdf")

What You Get

Every node has typed fields with IDE autocomplete:

node = graph.sections[0]

node.content.text                        # "Introduction"
node.token_count                         # 206
node.location.semantic.path              # "2.3"
node.location.semantic.breadcrumbs       # ["paper.pdf", "Introduction"]
node.location.physical.page              # 1
node.location.physical.bounding_box      # BoundingBox(x=91.9, y=585.9, ...)

Navigate the tree:

parent = node.get_parent(graph)
children = node.get_children(graph)

Render text:

print(section.render(graph, breadcrumbs=True))
# [paper.pdf > Introduction]
# The fundamental problem of communication...

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blazegraph_io-0.1.0.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blazegraph_io-0.1.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file blazegraph_io-0.1.0.tar.gz.

File metadata

  • Download URL: blazegraph_io-0.1.0.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for blazegraph_io-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bac4892a6b527972078aa5cb56a9a8496e6aa3cd9fb829021b029c4a685f459e
MD5 c9d12e17d82a1f858b34bc25bc1e7cfd
BLAKE2b-256 6946beeaaaf998513c85f7666a9aed4739c88c0a15ea6b8c73583bdd0bf06c24

See more details on using hashes here.

File details

Details for the file blazegraph_io-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: blazegraph_io-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for blazegraph_io-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 315ef321efd32d8a0245049df337419cdf2f5acc8e8e5ccf06e5065dfd1881db
MD5 4c65573ca6a0a29cec3ea0a4ed0ad614
BLAKE2b-256 8b6dd965e4ac4515d25255d2e0225e1121906247050369d0939b6c28e77f85c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page