Skip to main content

Python SDK for Blazegraph — parse PDFs into typed semantic document graphs

Project description

blazegraph-io

Python SDK for Blazegraph — parse PDFs into typed semantic document graphs.

Install

pip install blazegraph-io

Python 3.9+. Only dependency: httpx.

Quick Start

import blazegraphio as bg

# Local mode — no account needed, runs on your machine
graph = bg.parse_pdf("document.pdf")

print(f"{len(graph.nodes)} nodes, {len(graph.sections)} sections")

for section in graph.sections:
    print(section.content.text)
    print(section.location.semantic.breadcrumbs)
    print(f"  Page {section.location.physical.page}")

On first run, the SDK downloads the blazegraph-cli binary and a JRE automatically. Subsequent runs are instant.

API Mode

Switch to the hosted API with one line:

bg.configure(api_key="blaze_prod_...")
graph = bg.parse_pdf("document.pdf")  # same code, cloud processing

Async is also supported:

graph = await bg.parse_pdf_async("document.pdf")

What You Get

Every node has typed fields with IDE autocomplete:

node = graph.sections[0]

node.content.text                        # "Introduction"
node.token_count                         # 206
node.location.semantic.path              # "2.3"
node.location.semantic.breadcrumbs       # ["paper.pdf", "Introduction"]
node.location.physical.page              # 1
node.location.physical.bounding_box      # BoundingBox(x=91.9, y=585.9, ...)

Navigate the tree:

parent = node.get_parent(graph)
children = node.get_children(graph)

Render text:

print(section.render(graph, breadcrumbs=True))
# [paper.pdf > Introduction]
# The fundamental problem of communication...

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blazegraph_io-0.2.2.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blazegraph_io-0.2.2-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file blazegraph_io-0.2.2.tar.gz.

File metadata

  • Download URL: blazegraph_io-0.2.2.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for blazegraph_io-0.2.2.tar.gz
Algorithm Hash digest
SHA256 13ff5785535c9e1d33f86123d8abddd1dcf96214057fd6d7bd3367d3dc4d3b99
MD5 3a4dcf2354421b3dc68cdbe7caa8e379
BLAKE2b-256 f66cf65be8895c07b7d81a28f165c1222ee8f8de38d62fe251b7c14e076189d2

See more details on using hashes here.

File details

Details for the file blazegraph_io-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: blazegraph_io-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for blazegraph_io-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b2f5faeac08e382e27e22aa72d12ba93f3858d453e9b986884bb2cfa1640ad9b
MD5 dfbf494af29a28d8d7b7b8ab39089f81
BLAKE2b-256 e910c9057ab7abb796968c42ba604ae17f64d3f0992a2e4a91329831e9519226

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page