Skip to main content

No project description provided

Project description

Parallex

What it does

  • Converts PDF into images
  • Makes requests to Azure OpenAI to covert the images to markdown using Batch API
  • Polls for batch completion and then coverts AI responses in structured output based on the page of the corresponding PDF
  • Post batch processing to do what you wish with the resulting markdown

Requirements

Parallex uses graphicsmagick for the conversion of PDF to images.

brew install graphicsmagick

Example usage

import os
from parallex.models.parallex_callable_output import ParallexCallableOutput
from parallex.parallex import parallex

os.environ["AZURE_OPENAI_API_KEY"] = "key"
os.environ["AZURE_OPENAI_ENDPOINT"] = "your-endpoint.com"
os.environ["AZURE_OPENAI_API_VERSION"] = "deployment_version"
os.environ["AZURE_OPENAI_API_DEPLOYMENT"] = "deployment_name"

model = "gpt-4o"

async def some_operation(file_url: str) -> None:
  response_data: ParallexCallableOutput = await parallex(
    model=model,
    pdf_source_url=file_url,
    post_process_callable=example_post_process, # Optional
    concurrency=2, # Optional
    prompt_text="Turn images into markdown", # Optional
    log_level="ERROR" # Optional
  )
  pages = response_data.pages

def example_post_process(output: ParallexCallableOutput) -> None:
    file_name = output.file_name
    pages = output.pages
    for page in pages:
        markdown_for_page = page.output_content
        pdf_page_number = page.page_number
        

Responses have the following structure;

class ParallexCallableOutput(BaseModel):
    file_name: str = Field(description="Name of file that is processed")
    pdf_source_url: str = Field(description="Given URL of the source of output")
    trace_id: UUID = Field(description="Unique trace for each file")
    pages: list[PageResponse] = Field(description="List of PageResponse objects")

class PageResponse(BaseModel):
    output_content: str = Field(description="Markdown generated for the page")
    page_number: int = Field(description="Page number of the associated PDF")

Default prompt is

"""
    Convert the following PDF page to markdown.
    Return only the markdown with no explanation text.
    Leave out any page numbers and redundant headers or footers.
    Do not include any code blocks (e.g. "```markdown" or "```") in the response.
    If unable to parse, return an empty string.
"""

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parallex-0.1.1.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

parallex-0.1.1-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file parallex-0.1.1.tar.gz.

File metadata

  • Download URL: parallex-0.1.1.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.6 Darwin/21.6.0

File hashes

Hashes for parallex-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1cc59f4bda4e9f4dce9f336176b6a6376487b1efb8d674e9bacf8638cb3b1952
MD5 e207f8ae644efeaa4c618ea52a9469f5
BLAKE2b-256 3ef8eeb840ab0c630fa784c57f07ddbe2a12915b25d4bb6588fda0606449fdae

See more details on using hashes here.

File details

Details for the file parallex-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: parallex-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.6 Darwin/21.6.0

File hashes

Hashes for parallex-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 85de5b3b4c180205abf834586ff3f93dba82f2b65879edd05a0f2ecd1317ce8e
MD5 53c310cbc5bc5815f6a5021fc4c65f44
BLAKE2b-256 c5f2c383c4c1db1de4226354188bf68a445f297519e8f0cb7f620d27c7da1f48

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page