A Python package to extract text from images and PDFs using Vision Language Model (VLM).
Project description
Vlense
A Python package to extract text from images and PDFs using Vision Language Models (VLM).
Features
- Extract text from images and PDFs
- Supports JSON, HTML, and Markdown formats
- Easy integration with Vision Language Models
- Asynchronous processing with batch support
- Custom JSON schema for structured output
Installation
pip install vlense
Usage
import os
import asyncio
from vlense import Vlense
from pydantic import BaseModel
path = ["./images/image1.jpg", "test.pdf"]
output_dir = "./output"
model = "gemini/gemini-1.5-flash"
temp_dir = "./temp_images"
os.environ["GEMINI_API_KEY"] = "YOUR_API_KEY"
async def main():
vlense = Vlense()
responses = await vlense.ocr(
file_path=path,
model=model,
output_dir=output_dir,
temp_dir=temp_dir,
batch_size=3,
clean_temp_files=False,
)
if __name__ == "__main__":
asyncio.run(main())
API
Vlense.ocr()
Performs OCR on the provided files.
Parameters:
-
file_path : (Union[str, List[str]]): Path or list of paths to PDF/image files.
-
model : (str, optional): Model name for generating completions. Defaults to
"gemini-1.5-flash". -
output_dir : (Optional[str], optional): Directory to save output. Defaults to
None. -
temp_dir : (Optional[str], optional): Directory for temporary files. Defaults to system temp.
-
batch_size : (int, optional): Number of concurrent processes. Defaults to
3. -
format : (str, optional): Output format (
'markdown','html','json'). Defaults to'markdown'. -
json_schema : (Optional[Type[BaseModel]], optional): Pydantic model for JSON output. Required if format is
'json'. -
clean_temp_files : (Optional[bool], optional): Cleanup temporary files after processing. Defaults to
True.
Returns:
- Dict[str, VlenseResponse] : Generated content.
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contact
Author: Aditya Miskin
Email: adityamiskin98@gmail.com
Repository: https://github.com/adityamiskin/vlense
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vlense-0.1.2.tar.gz.
File metadata
- Download URL: vlense-0.1.2.tar.gz
- Upload date:
- Size: 2.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3259840d0ca6849dcecd552f32c193b826d6a5d7a5d8ae8667ae56b5c59307f4
|
|
| MD5 |
7eef8d3de7799b21bddffa119b0f7b7e
|
|
| BLAKE2b-256 |
d388733bf6ad07beceda2b3520a256ae225546e519db641136261757c8e67e2a
|
File details
Details for the file vlense-0.1.2-py3-none-any.whl.
File metadata
- Download URL: vlense-0.1.2-py3-none-any.whl
- Upload date:
- Size: 2.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2535b1a67a2e4cb05c76bb3be1df4f3ec27caecd84172a680db823c28f7eae59
|
|
| MD5 |
a1ccbf501a2733d6ff5a65f6e918edda
|
|
| BLAKE2b-256 |
ced9b1e40779eb346b2d7016f8669589ff5a07ae8a849fa4bcb320e89adca5bc
|