Skip to main content

Python client for Parsr - Transforms PDF, Documents and Images into Enriched Structured Data

Project description

Parsr Client

Provides a python interface to the Parsr tool via its API. Parsr transforms PDF, documents and images into enriched, structured data.

Find out all about Parsr (including download) at https://github.com/axa-group/Parsr.

1 Installation

pip install parsr-client

2 Usage

Make sure that the Parsr Server is already running. Let us suppose that the address is localhost:3001

2.1 Connect to the Parsr server

from parsr_client import ParsrClient
parsr = ParsrClient('localhost:3001')

2.2 Send the document

parsr.send_document(
   file_path='README.pdf',
   config_path='defaultConfig.json'
   document_name='The Readme',
   save_request_id=True)

2.4 Retrieve results

  1. Get everything as a JSON:

    parsr.get_json()
    
  2. As Markdown:

    parsr.get_markdown()
    
  3. As text:

    parsr.get_text()
    
  4. Get the first table on the first page:

    parsr.get_table(
        page=1,
        table=1,
    )
    
  5. Get all the versions of the document:

    parsr.get_revisions('The Readme')
    
  6. Get pretty diffs between each successive pair of a document's revisions:

    parsr.compare_revisions('The Readme', pretty_html=True)
    

3 Interpreting the whole JSON output locally

The supplied ParsrOutputInterpreter class can be used to interpret the downloaded JSON output and generate higher level structures like the text body.

Here's an example to generate text body on the first page from the above example.

from parsr_client import ParsrOutputInterpreter

parsr_interpreter = ParsrOutputInterpreter(
    parsr.get_json()
)

t = parsr_interpreter.get_text(
    page_number=1
)
print(t)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsr-client-3.2.3.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

parsr_client-3.2.3-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file parsr-client-3.2.3.tar.gz.

File metadata

  • Download URL: parsr-client-3.2.3.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.10 CPython/3.8.5 Linux/5.8.3-zen1-1-zen

File hashes

Hashes for parsr-client-3.2.3.tar.gz
Algorithm Hash digest
SHA256 ef961b58cc42f6e4c0fa4111d6d940f3c699e0b4fb5d92b01743ed627d1a5113
MD5 f83c015707b4ae436ceaea04f6e0b216
BLAKE2b-256 fe51a5d20306713d74f6b002abbf82eecc82422da2b994e92658a2dbf808db78

See more details on using hashes here.

File details

Details for the file parsr_client-3.2.3-py3-none-any.whl.

File metadata

  • Download URL: parsr_client-3.2.3-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.10 CPython/3.8.5 Linux/5.8.3-zen1-1-zen

File hashes

Hashes for parsr_client-3.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 268d5898c533d20fd4acb41039a711a6737e0d148d295ea9862c0fbb42e205f2
MD5 08b9310bcd5123bd4794d719f73c8a44
BLAKE2b-256 da7482edd9e307257b1a71d3139b60db7e18ce8be72906d2cc7753107b43c237

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page