Skip to main content

Python client and MCP server for the AILANG Parse document parsing API

Project description

AILANG Parse Python SDK

Python client and MCP server for the AILANG Parse document parsing API. Parse 13 formats, generate 8 — zero dependencies for Office, pluggable AI for PDFs.

Install

pip install ailang-parse

MCP Server (Claude Desktop, Cursor, VS Code)

Run as a stdio MCP server that bridges to the hosted AILANG Parse API. Stdlib only — works in any Python >= 3.8 environment.

{
  "mcpServers": {
    "ailang-parse": {
      "command": "uvx",
      "args": ["ailang-parse", "mcp"]
    }
  }
}

Add to claude_desktop_config.json (Claude Desktop), .cursor/mcp.json (Cursor), or .vscode/settings.json (VS Code). Provides 7 tools: parse, convert, formats, estimate, auth, auth-poll, and account.

Quick Start

from ailang_parse import DocParse

client = DocParse(api_key="dp_your_key_here")

# Parse a document
result = client.parse("report.docx")
print(f"{len(result.blocks)} blocks, format: {result.format}")

for block in result.blocks:
    if block.type == "heading":
        print(f"  H{block.level}: {block.text}")
    elif block.type == "table":
        print(f"  Table: {len(block.headers)} cols, {len(block.rows)} rows")
    elif block.type == "change":
        print(f"  {block.change_type} by {block.author}: {block.text}")
    else:
        print(f"  {block.type}: {block.text[:80]}")

Parse Documents

# Parse with different output formats
result = client.parse("report.docx")                        # Block ADT (default)
result = client.parse("report.docx", output_format="markdown")  # Markdown
result = client.parse("report.docx", output_format="html")      # HTML
result = client.parse("report.docx", output_format="markdown+metadata")  # Markdown with sections

# Upload a local file (multipart)
result = client.parse_file("local/report.docx")

# Parse from a signed URL (GCS, S3, Azure Blob — no local file needed)
result = client.parse_url(
    "https://storage.googleapis.com/bucket/doc.docx?X-Goog-Signature=...",
    output_format="markdown+metadata",
)

# Access structured data
print(result.status)          # "success"
print(result.filename)        # "report.docx"
print(result.format)          # "zip-office"
print(result.blocks)          # List[Block]
print(result.metadata.title)  # Document title
print(result.metadata.author) # Document author
print(result.summary.tables)  # Number of tables found

# markdown+metadata format includes sections
print(result.markdown)        # Full rendered markdown
for section in result.sections:
    print(f"  {section.heading}: {section.markdown[:60]}...")

Response Metadata

Every parse result includes quota and request metadata from response headers:

result = client.parse("report.docx")
meta = result.response_meta

print(meta.request_id)            # "req_abc123"
print(meta.tier)                  # "free", "pro", or "business"
print(meta.quota_remaining_day)   # Requests left today
print(meta.quota_remaining_month) # Requests left this month
print(meta.quota_remaining_ai)    # AI requests remaining
print(meta.format)                # Detected input format ("docx", etc.)
print(meta.replayable)            # Whether this request can be replayed

Supported Formats

formats = client.formats()
print(formats.parse)       # ['docx', 'pptx', 'xlsx', 'odt', 'odp', 'ods', 'html', 'md', 'csv', 'epub', 'pdf', 'png', 'jpg']
print(formats.generate)    # ['docx', 'pptx', 'xlsx', 'odt', 'odp', 'ods', 'html', 'md']
print(formats.ai_required) # ['pdf', 'png', 'jpg', 'gif', 'bmp', 'tiff']

Block Types

AILANG Parse returns 9 block types:

Type Fields Description
text text, style, level Paragraphs, code blocks
heading text, level (1-6) Document headings
table headers, rows Tables with merge tracking
list items, ordered Ordered/unordered lists
image description, mime, data_length Embedded images
audio transcription, mime Audio transcriptions
video description, mime Video descriptions
section kind, children Slides, sheets, headers/footers
change change_type, author, date, text Track changes

Table cells

Table cells can be simple strings or merged cells:

for block in result.blocks:
    if block.type == "table":
        for cell in block.headers:
            print(f"  {cell.text} (colspan={cell.col_span}, merged={cell.merged})")

Nested sections

Section blocks contain child blocks (slides, sheets, headers/footers):

for block in result.blocks:
    if block.type == "section":
        print(f"Section: {block.kind}")  # "slide", "sheet", "header", "footer", etc.
        for child in block.children:
            print(f"  {child.type}: {child.text[:50]}")

API Key Management

API key resolution (checked in order):

  1. Explicit api_key parameter
  2. DOCPARSE_API_KEY environment variable
  3. Saved credentials in ~/.config/ailang-parse/credentials.json

Use the device auth flow to get an API key. The user signs in once — the key is saved automatically and reused in future sessions.

from ailang_parse import DocParse

# First time: device_auth() opens browser, user signs in, key saved to disk
client = DocParse()
client.device_auth(label="my-agent")

# Future sessions: key auto-loaded from ~/.config/ailang-parse/credentials.json
client = DocParse()
result = client.parse("report.docx")

# Or set env var: export DOCPARSE_API_KEY=dp_your_key
client = DocParse()
result = client.parse("report.docx")

# Check usage
usage = client.keys.usage(key_id="abc123", user_id="user123")
print(f"Requests today: {usage.usage.requests_today} / {usage.quota.requests_per_day}")

# Rotate (new key, old one revoked, same tier)
new_key = client.keys.rotate(key_id="abc123", user_id="user123")
print(new_key.key)  # New key

# Revoke
client.keys.revoke(key_id="abc123", user_id="user123")

Migrating from Unstructured

One import change:

# Before
from unstructured_client import UnstructuredClient
client = UnstructuredClient(server_url="https://api.unstructured.io")

# After
from ailang_parse import UnstructuredClient
client = UnstructuredClient(
    server_url="https://api.parse.sunholo.com"
)

# All existing code works unchanged
elements = client.general.partition(file="report.docx")
for el in elements:
    print(f"{el.type}: {el.text[:80]}")
    print(f"  metadata: {el.metadata.filename}")

Error Handling

from ailang_parse import DocParse, DocParseError, AuthError, QuotaError

client = DocParse(api_key="dp_invalid")

try:
    result = client.parse("file.docx")
except AuthError as e:
    print(f"Bad key: {e}")           # 401
except QuotaError as e:
    print(f"Quota exceeded: {e}")    # 429
except DocParseError as e:
    print(f"API error ({e.status_code}): {e}")
    print(f"  suggested fix: {e.suggested_fix}")
    print(f"  details: {e.details}")       # Structured error details dict
    print(f"  request_id: {e.request_id}") # For support/debugging

Configuration

client = DocParse(
    api_key="dp_your_key",
    base_url="https://your-deployment.run.app",  # Custom endpoint
    timeout=120,                                   # Request timeout (seconds)
)

License

Apache 2.0 — see LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ailang_parse-0.5.1.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ailang_parse-0.5.1-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file ailang_parse-0.5.1.tar.gz.

File metadata

  • Download URL: ailang_parse-0.5.1.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ailang_parse-0.5.1.tar.gz
Algorithm Hash digest
SHA256 1020f95fe2d7eb3cd1ff4420a686e52ebec1c5dfa253eaa79ee256dc6b2ddf3a
MD5 bf01d98333db967520c579fcd8712ff3
BLAKE2b-256 c97a8a7b22f7e219986d1b10d80ac9ddca15c63c162da525f0de16af225eb39f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ailang_parse-0.5.1.tar.gz:

Publisher: publish-sdks.yml on sunholo-data/ailang-parse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ailang_parse-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: ailang_parse-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ailang_parse-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0cc758dc823865d5fb8cd96ef2e5d493675a5408c384fc6f76424af5d8af7de4
MD5 d614c32e70be81cbc53b90aa94677859
BLAKE2b-256 7b8b3aeba89472129b3e16da4f1eb1868b2197cc994b24d602d1bada07f2b05f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ailang_parse-0.5.1-py3-none-any.whl:

Publisher: publish-sdks.yml on sunholo-data/ailang-parse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page