Skip to main content

Anyparser LangChain Integration

Project description

Anyparser LangChain: Seamless Integration of Anyparser with LangChain

https://anyparser.com

Integrate Anyparser's powerful content extraction capabilities with LangChain for enhanced AI workflows. This integration package enables seamless use of Anyparser's document processing and data extraction features within your LangChain applications, making it easier than ever to build sophisticated AI pipelines.

Installation

pip install anyparser-langchain

Anyparser LangChain Examples

This examples directory contains examples demonstrating different ways to use the Anyparser LangChain integration.

python examples/01_single_file_json.py
python examples/02_single_file_markdown.py
python examples/03_multiple_files_json.py
python examples/04_multiple_files_markdown.py
python examples/05_load_folder.py
python examples/06_ocr_markdown.py
python examples/07_ocr_json.py
python examples/08_crawler.py

Setup

Before running the examples, make sure to set your Anyparser API credentials as environment variables:

export ANYPARSER_API_KEY="your-api-key"
export ANYPARSER_API_URL="https://anyparserapi.com"

Examples

1. Single File Processing

  • 01_single_file_json.py: Process a single file with JSON output
  • 02_single_file_markdown.py: Process a single file with markdown output

2. Multiple File Processing

  • 03_multiple_files_json.py: Process multiple files with JSON output
  • 04_multiple_files_markdown.py: Process multiple files with markdown output
  • 05_load_folder.py: Load and process all files from a folder (max 5 files)

3. OCR Processing

  • 06_ocr_markdown.py: Process images/scans with OCR (markdown output)
  • 07_ocr_json.py: Process images/scans with OCR (JSON output)

4. Web Crawling

  • 08_crawler_basic.py: Basic web crawling with essential settings

Features Demonstrated

Document Processing

  • Different output formats (markdown, JSON)
  • Multiple file handling
  • Folder processing
  • Metadata handling

OCR Capabilities

  • Language support (ISO 639-2 codes)
  • OCR presets (fast, balanced, scan)
  • Image and table extraction

Web Crawling

  • Basic crawling with depth and scope control
  • Advanced URL and content filtering
  • Crawling strategies (BFS, LIFO)
  • Rate limiting and robots.txt respect

Notes

  • All examples use async/await for better performance
  • Error handling is included in all examples
  • Each example includes detailed comments explaining the options used
  • OCR examples support multiple languages
  • Crawler examples demonstrate various filtering and control options

Features Demonstrated

  • Different output formats (markdown, JSON)
  • OCR capabilities with language support
  • OCR performance presets
  • Image extraction
  • Table extraction
  • Metadata handling
  • Error handling
  • Async/await usage

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anyparser_langchain-0.0.2.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anyparser_langchain-0.0.2-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file anyparser_langchain-0.0.2.tar.gz.

File metadata

  • Download URL: anyparser_langchain-0.0.2.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for anyparser_langchain-0.0.2.tar.gz
Algorithm Hash digest
SHA256 bf9d18ea59e064545e2523b7cb9d653a30aeec58282f177c9918eb513352862d
MD5 713df7f21b4d7724ed7631bc342f3f37
BLAKE2b-256 24156ae735019c60aed797d5db81bea0129813ed4fa88c1ceaea671519f8b3d6

See more details on using hashes here.

File details

Details for the file anyparser_langchain-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for anyparser_langchain-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 721737b785524eb7eaf08f7398a81898801f8f673de45f053c7760c10289fd57
MD5 583b94aaa28ad9f59190d2ecbd9ac4cd
BLAKE2b-256 9aee1220fcec418e828c1eb64fbf3fadd30e5be7b0ac5886350ae46643fd2c7e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page