Python client for Parsr - Transforms PDF, Documents and Images into Enriched Structured Data
Project description
Parsr Client
Provides a python interface to the Parsr tool via its API. Parsr transforms PDF, documents and images into enriched, structured data.
Find out all about Parsr (including download) at https://github.com/axa-group/Parsr.
1 Installation
pip install parsr-client
2 Usage
Make sure that the Parsr Server is already running. Let us suppose that the address is localhost:3001
2.1 Connect to the Parsr server
from parsr_client import ParsrClient
parsr = ParsrClient('localhost:3001')
2.2 Send the document
parsr.send_document(
file_path='README.pdf',
config_path='defaultConfig.json'
document_name='The Readme',
save_request_id=True)
2.4 Retrieve results
-
Get everything as a JSON:
parsr.get_json()
-
As Markdown:
parsr.get_markdown()
-
As text:
parsr.get_text()
-
Get the first table on the first page:
parsr.get_table( page=1, table=1, )
-
Get all the versions of the document:
parsr.get_revisions('The Readme')
-
Get pretty diffs between each successive pair of a document's revisions:
parsr.compare_revisions('The Readme', pretty_html=True)
3 Interpreting the whole JSON output locally
The supplied ParsrOutputInterpreter
class can be used to interpret the downloaded JSON output and generate higher level structures like the text body.
Here's an example to generate text body on the first page from the above example.
from parsr_client import ParsrOutputInterpreter
parsr_interpreter = ParsrOutputInterpreter(
parsr.get_json()
)
t = parsr_interpreter.get_text(
page_number=1
)
print(t)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file parsr-client-3.2.3.tar.gz
.
File metadata
- Download URL: parsr-client-3.2.3.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.10 CPython/3.8.5 Linux/5.8.3-zen1-1-zen
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef961b58cc42f6e4c0fa4111d6d940f3c699e0b4fb5d92b01743ed627d1a5113 |
|
MD5 | f83c015707b4ae436ceaea04f6e0b216 |
|
BLAKE2b-256 | fe51a5d20306713d74f6b002abbf82eecc82422da2b994e92658a2dbf808db78 |
File details
Details for the file parsr_client-3.2.3-py3-none-any.whl
.
File metadata
- Download URL: parsr_client-3.2.3-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.10 CPython/3.8.5 Linux/5.8.3-zen1-1-zen
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 268d5898c533d20fd4acb41039a711a6737e0d148d295ea9862c0fbb42e205f2 |
|
MD5 | 08b9310bcd5123bd4794d719f73c8a44 |
|
BLAKE2b-256 | da7482edd9e307257b1a71d3139b60db7e18ce8be72906d2cc7753107b43c237 |