Official Python SDK for Parsefy - AI-powered document data extraction

These details have not been verified by PyPI

Project links

Project description

Parsefy Python SDK

Official Python SDK for Parsefy - AI-powered document data extraction.

Extract structured data from PDF and DOCX documents using Pydantic models. Simply define your schema and let Parsefy handle the rest.

Installation

pip install parsefy

Quick Start

from parsefy import Parsefy
from pydantic import BaseModel, Field

# Initialize client (reads PARSEFY_API_KEY from environment)
client = Parsefy()

# Define your extraction schema
class Invoice(BaseModel):
    invoice_number: str = Field(description="The invoice number")
    date: str = Field(description="Invoice date in YYYY-MM-DD format")
    total: float = Field(description="Total amount")
    currency: str = Field(description="3-letter currency code")

# Extract data from a document
result = client.extract(file="invoice.pdf", schema=Invoice)

if result.error is None:
    print(f"Invoice #{result.data.invoice_number}")
    print(f"Total: {result.data.total} {result.data.currency}")
    print(f"Credits used: {result.metadata.credits}")
else:
    print(f"Error: {result.error.message}")

Features

Type-safe extraction - Full type inference with Pydantic models
Sync & async support - Both extract() and extract_async() methods
Multiple input types - File paths, bytes, or file-like objects
Detailed metadata - Processing time, token usage, and credits consumed
Client-side validation - File type, size, and existence checks before upload

Authentication

Set your API key via environment variable:

export PARSEFY_API_KEY=pk_your_api_key

Or pass it directly:

client = Parsefy(api_key="pk_your_api_key")

Usage Examples

Basic Extraction

from parsefy import Parsefy
from pydantic import BaseModel, Field

client = Parsefy()

class Person(BaseModel):
    name: str = Field(description="Full name of the person")
    email: str = Field(description="Email address")
    phone: str | None = Field(default=None, description="Phone number if present")

result = client.extract(file="contact.pdf", schema=Person)

if result.error is None:
    print(result.data.name)
    print(result.data.email)

Complex Schemas

from parsefy import Parsefy
from pydantic import BaseModel, Field

client = Parsefy()

class LineItem(BaseModel):
    description: str = Field(description="Item description")
    quantity: int = Field(description="Quantity ordered")
    unit_price: float = Field(description="Price per unit")
    total: float = Field(description="Line total")

class Invoice(BaseModel):
    invoice_number: str = Field(description="Invoice number")
    vendor: str = Field(description="Vendor company name")
    date: str = Field(description="Invoice date (YYYY-MM-DD)")
    line_items: list[LineItem] = Field(description="List of items on the invoice")
    subtotal: float = Field(description="Subtotal before tax")
    tax: float = Field(description="Tax amount")
    total: float = Field(description="Total amount due")

result = client.extract(file="invoice.pdf", schema=Invoice)

if result.error is None:
    for item in result.data.line_items:
        print(f"{item.description}: {item.quantity} x ${item.unit_price}")

Async Usage

import asyncio
from parsefy import Parsefy
from pydantic import BaseModel, Field

class Receipt(BaseModel):
    store_name: str = Field(description="Name of the store")
    total: float = Field(description="Total amount paid")

async def process_receipts():
    async with Parsefy() as client:
        tasks = [
            client.extract_async(file=f"receipt_{i}.pdf", schema=Receipt)
            for i in range(1, 4)
        ]
        results = await asyncio.gather(*tasks)
        
        for i, result in enumerate(results, 1):
            if result.error is None:
                print(f"Receipt {i}: {result.data.store_name} - ${result.data.total}")

asyncio.run(process_receipts())

Different Input Types

from parsefy import Parsefy
from pydantic import BaseModel
from pathlib import Path

client = Parsefy()

class Document(BaseModel):
    title: str
    content: str

# From file path string
result = client.extract(file="document.pdf", schema=Document)

# From Path object
result = client.extract(file=Path("document.pdf"), schema=Document)

# From bytes
with open("document.pdf", "rb") as f:
    file_bytes = f.read()
result = client.extract(file=file_bytes, schema=Document)

# From file object
with open("document.pdf", "rb") as f:
    result = client.extract(file=f, schema=Document)

Error Handling

from parsefy import Parsefy, APIError, ValidationError
from pydantic import BaseModel

client = Parsefy()

class Invoice(BaseModel):
    number: str
    total: float

try:
    result = client.extract(file="invoice.pdf", schema=Invoice)
    
    if result.error is None:
        print(result.data)
    else:
        # Extraction-level error (API returned 200 but extraction failed)
        print(f"Extraction failed: {result.error.code}")
        print(f"Message: {result.error.message}")

except ValidationError as e:
    # Client-side validation error (file not found, wrong type, etc.)
    print(f"Validation error: {e.message}")

except APIError as e:
    # HTTP error from API (401, 429, 500, etc.)
    print(f"API error {e.status_code}: {e.message}")

API Reference

`Parsefy` Client

client = Parsefy(
    api_key: str | None = None,      # API key (or set PARSEFY_API_KEY env var)
    timeout: float = 60.0,           # Request timeout in seconds
)

`extract()` / `extract_async()`

result = client.extract(
    file: str | Path | bytes | BinaryIO,  # Document to extract from
    schema: type[T],                       # Pydantic model class
) -> ExtractResult[T]

`ExtractResult[T]`

Field	Type	Description
`data`	`T \| None`	Extracted data (or None on error)
`metadata`	`ExtractionMetadata`	Processing metadata
`error`	`APIErrorDetail \| None`	Error details (or None on success)

`ExtractionMetadata`

Field	Type	Description
`processing_time_ms`	`int`	Processing time in milliseconds
`input_tokens`	`int`	Input tokens used
`output_tokens`	`int`	Output tokens generated
`credits`	`int`	Credits consumed (1 credit = 1 page)
`fallback_triggered`	`bool`	Whether fallback model was used

Supported File Types

PDF (.pdf)
Microsoft Word (.docx)

Maximum file size: 10MB

Requirements

Python 3.10+
Pydantic 2.0+
httpx 0.25+

License

MIT License - see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.2

Jan 12, 2026

1.1.1

Jan 12, 2026

1.1.0

Jan 10, 2026

This version

1.0.0

Jan 7, 2026

0.0.1

Dec 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsefy-1.0.0.tar.gz (7.1 kB view details)

Uploaded Jan 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

parsefy-1.0.0-py3-none-any.whl (9.1 kB view details)

Uploaded Jan 7, 2026 Python 3

File details

Details for the file parsefy-1.0.0.tar.gz.

File metadata

Download URL: parsefy-1.0.0.tar.gz
Upload date: Jan 7, 2026
Size: 7.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for parsefy-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`dab11e40882596075d90fb8dd6f70e1d4e88cbbf68d1b2e0b1d15361a868d813`
MD5	`e6969fefde29267386963222dc345ff9`
BLAKE2b-256	`9566708b5fec9ff3707df6fcd942c49832d91301346fc4054e3b1f5d89bab227`

See more details on using hashes here.

File details

Details for the file parsefy-1.0.0-py3-none-any.whl.

File metadata

Download URL: parsefy-1.0.0-py3-none-any.whl
Upload date: Jan 7, 2026
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for parsefy-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`72874c5d3080e853397c1b91f941f76b6c71ad6f5513774fd5653c4217a3aeae`
MD5	`7b5fb35e2d412776d20bc627358930df`
BLAKE2b-256	`b92b118bdb5890f26a2e8efc382bdb1c8fff0da3de2f9ba1e4a0639ed36f64f7`

See more details on using hashes here.

parsefy 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Parsefy Python SDK

Installation

Quick Start

Features

Authentication

Usage Examples

Basic Extraction

Complex Schemas

Async Usage

Different Input Types

Error Handling

API Reference

Parsefy Client

extract() / extract_async()

ExtractResult[T]

ExtractionMetadata

Supported File Types

Requirements

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`Parsefy` Client

`extract()` / `extract_async()`

`ExtractResult[T]`

`ExtractionMetadata`