An integration package connecting PaddleOCR and LangChain
Project description
langchain-paddleocr
This package provides access to PaddleOCR's capabilities within the LangChain ecosystem.
Quick Install
pip install langchain-paddleocr
Basic Usage
PaddleOCRVLLoader
The PaddleOCRVLLoader enables you to:
- Extract text and layout information from PDF and image files using models from Baidu's PaddleOCR-VL series (e.g., PaddleOCR-VL, PaddleOCR-VL-1.5)
- Process documents from local files or remote URLs
Basic usage of PaddleOCRVLLoader looks as follows:
from langchain_paddleocr import PaddleOCRVLLoader
from pydantic import SecretStr
loader = PaddleOCRVLLoader(
file_path="path/to/document.pdf",
api_url="your-api-endpoint",
access_token=SecretStr("your-access-token") # Optional if using environment variable `PADDLEOCR_ACCESS_TOKEN`
)
docs = loader.load()
for doc in docs[:2]:
print(f"Content: {doc.page_content[:200]}...")
print(f"Source: {doc.metadata['source']}")
print("---")
📖 Documentation
For full documentation, see the API reference. For conceptual guides, tutorials, and usage examples, see the LangChain Docs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_paddleocr-0.1.0.tar.gz.
File metadata
- Download URL: langchain_paddleocr-0.1.0.tar.gz
- Upload date:
- Size: 268.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2350af4d6c7aee833b2215e36d535b3500fd1b05217169f8269dc2e078aa482
|
|
| MD5 |
e5bb242772cbf43dadbac28528516097
|
|
| BLAKE2b-256 |
33a159589c71a4c8bc93bb892b15c524d81f31677547e1b61351b5e59d452bda
|
File details
Details for the file langchain_paddleocr-0.1.0-py3-none-any.whl.
File metadata
- Download URL: langchain_paddleocr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
252a47f22899011fa95acbb9eaa19ddff0c321edfb8babad7cccdfa36b071311
|
|
| MD5 |
21279a015d9a25de1438952e1256adb8
|
|
| BLAKE2b-256 |
d97a73943118df7d73d2458b924a199b0f740f362bccb642183e61fa96d7540d
|