automatically parse PDF's and texts to dataclasses
Project description
piah
Piah automatically parse the data from PDF's or texts based only in the dataclass that you provide and return the same dataclass fullfilled with the values. Piah is based in the OxyParser
Table of Contents
Installation
pip install piah
Example
from piah import Piah
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
parser = Piah("gpt-3.5-turbo")
result = parser.parse("Hello Iam python and I have 33 years old", Person)
to parse PDF's:
result = parser.parse("example.pdf", Person)
#or
result = parser.parse(Path("example.pdf"), Person)
TODO
- Write docstrings
- Improve allowed types
- Improve system prompt
Know Issues
Seems that piah
don't pass every time in the test, because the LLM don't parse
correctly every time large PDF's
License
piah
is distributed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
piah-0.1.1.tar.gz
(53.5 kB
view hashes)
Built Distribution
piah-0.1.1-py3-none-any.whl
(5.4 kB
view hashes)