No project description provided
Project description
Parallex
What it does
- Converts PDF into images
- Makes requests to Azure OpenAI to covert the images to markdown using Batch API
- Polls for batch completion and then coverts AI responses in structured output based on the page of the corresponding PDF
- Post batch processing to do what you wish with the resulting markdown
Requirements
Parallex uses graphicsmagick
for the conversion of PDF to images.
brew install graphicsmagick
Example usage
import os
from parallex.models.parallex_callable_output import ParallexCallableOutput
from parallex.parallex import parallex
os.environ["AZURE_OPENAI_API_KEY"] = "key"
os.environ["AZURE_OPENAI_ENDPOINT"] = "your-endpoint.com"
os.environ["AZURE_OPENAI_API_VERSION"] = "deployment_version"
os.environ["AZURE_OPENAI_API_DEPLOYMENT"] = "deployment_name"
model = "gpt-4o"
async def some_operation(file_url: str) -> None:
response_data: ParallexCallableOutput = await parallex(
model=model,
pdf_source_url=file_url,
post_process_callable=example_post_process, # Optional
concurrency=2, # Optional
prompt_text="Turn images into markdown", # Optional
log_level="ERROR" # Optional
)
pages = response_data.pages
def example_post_process(output: ParallexCallableOutput) -> None:
file_name = output.file_name
pages = output.pages
for page in pages:
markdown_for_page = page.output_content
pdf_page_number = page.page_number
Responses have the following structure;
class ParallexCallableOutput(BaseModel):
file_name: str = Field(description="Name of file that is processed")
pdf_source_url: str = Field(description="Given URL of the source of output")
trace_id: UUID = Field(description="Unique trace for each file")
pages: list[PageResponse] = Field(description="List of PageResponse objects")
class PageResponse(BaseModel):
output_content: str = Field(description="Markdown generated for the page")
page_number: int = Field(description="Page number of the associated PDF")
Default prompt is
"""
Convert the following PDF page to markdown.
Return only the markdown with no explanation text.
Leave out any page numbers and redundant headers or footers.
Do not include any code blocks (e.g. "```markdown" or "```") in the response.
If unable to parse, return an empty string.
"""
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
parallex-0.1.1.tar.gz
(9.2 kB
view details)
Built Distribution
parallex-0.1.1-py3-none-any.whl
(13.3 kB
view details)
File details
Details for the file parallex-0.1.1.tar.gz
.
File metadata
- Download URL: parallex-0.1.1.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.6 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1cc59f4bda4e9f4dce9f336176b6a6376487b1efb8d674e9bacf8638cb3b1952 |
|
MD5 | e207f8ae644efeaa4c618ea52a9469f5 |
|
BLAKE2b-256 | 3ef8eeb840ab0c630fa784c57f07ddbe2a12915b25d4bb6588fda0606449fdae |
File details
Details for the file parallex-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: parallex-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.6 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85de5b3b4c180205abf834586ff3f93dba82f2b65879edd05a0f2ecd1317ce8e |
|
MD5 | 53c310cbc5bc5815f6a5021fc4c65f44 |
|
BLAKE2b-256 | c5f2c383c4c1db1de4226354188bf68a445f297519e8f0cb7f620d27c7da1f48 |