Skip to main content

PDF Annotation Utils

Project description

# pdfannot

This package aims to create a two-way link between annotated pdf and excel data frame.

It allows you to :

- create an excel file containing each string annotated of the pdf in a column 'annot_text', along with its
annotation in a column 'content'.

- annotate a pdf given an excel file of the form described above.

It can be really useful for generating automatically annotated pdf documents with NLP models capable to
infer annotations from raw texts in a data frame.


### Prerequisites

fitz

### Installing

pip install pymupdf
(pipenv install pymupdf)

import fitz

### Authors

Arthur Renaud, Antoine Marullaz

### Examples

your DataFrame contains info on what to annot on the pdf :

- if it already has at least columns 'text' (texts to annotate),
'content' (description of each annotation), and 'type' ('Square' or 'Highlight') :

annotate_pdf(DataFrame, path_to_corresponding_pdf, path_destination_annotated_pdf)

will use your dataframe and the directory of your pdf passed in argument to annotate it and store where you want.


- if it is a DataFrame with one column per label of annotation (WARNING : each of them must be name annot_{label_name})
then you must first pass :

df_to_adf(DataFrame)

to make it acceptable by annotate_pdf.

next execute :

annotate_pdf(DataFrame, path_to_corresponding_pdf, path_destination_annotated_pdf)

to annotate your pdf (this method allows only highlights).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfannot-0.0.7.tar.gz (360.3 kB view details)

Uploaded Source

File details

Details for the file pdfannot-0.0.7.tar.gz.

File metadata

  • Download URL: pdfannot-0.0.7.tar.gz
  • Upload date:
  • Size: 360.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for pdfannot-0.0.7.tar.gz
Algorithm Hash digest
SHA256 807b58c9b466b1cc68850fdb9637fc7e55415decbbb809250e1b94da23ea0048
MD5 b0a4554e5c8fa57f25bf0c60518cb2fe
BLAKE2b-256 3e156e121335997277b4386461d53857fb99f67488dc468b72c6d4949a6c390d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page