Skip to main content

Finds and highlights text in documents

Project description

txtmarker: Highlight text in documents

Version GitHub Release Date GitHub issues GitHub last commit Build Status Coverage Status

demo

txtmarker highlights text in documents. txtmarker takes a list of (name, text) pairs, scan an input document and creates a modified version with highlights embedded.

Current file formats supported:

  • pdf

Installation

The easiest way to install is via pip and PyPI

pip install txtmarker

You can also install txtmarker directly from GitHub. Using a Python Virtual Environment is recommended.

pip install git+https://github.com/neuml/txtmarker

Python 3.6+ is supported

Examples

The examples directory has a series of examples and notebooks giving an overview of txtmarker. See the list of notebooks below.

Notebooks

Notebook Description
Introducing txtmarker Overview of the functionality provided by txtmarker Open In Colab
Highlighting with Transformers AI-driven highlighting with Transformers Open In Colab

Configuration

The following section gives an overview of highlighters and available methods/configuration. See the notebooks above for detailed examples.

Create a new highlighter

from txtmarker.factory import Factory
highlighter = Factory.create("pdf")

extension

extension: string

Type of highlighter to create (i.e. pdf)

Optional constructor arguments:

formatter

formatter: callable

Formats queries and input text using this method. Helps with cleanup of files with lots of symbols and other content.

chunks

chunks: int

Splits queries into multiple chunks. This is designed for very long text matches.

Highlight text

highlighter.highlight("input.pdf", "output.pdf", [("name", "text to highlight")])

infile

infile: string

Full path to input file

outfile

outfile: string

Full path to output file, i.e. the highlighted file

highlights

highlights: list of (string, string|regex)

List of highlight elements. Each pair has a name (can be None) and text value. The text can either be a string or a regular expression.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

txtmarker-1.0.0.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

txtmarker-1.0.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file txtmarker-1.0.0.tar.gz.

File metadata

  • Download URL: txtmarker-1.0.0.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.9

File hashes

Hashes for txtmarker-1.0.0.tar.gz
Algorithm Hash digest
SHA256 56e981f3bc8e54d26906473ae2a34c5119f14b6e9280d57271edab08f4f9a588
MD5 e25a8f6e4fab0fcc5b00c9f37159dd47
BLAKE2b-256 71acd31c66d9ccce8176c675cdded950e644e8aafe6e633c90f917e686282d3c

See more details on using hashes here.

File details

Details for the file txtmarker-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: txtmarker-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.9

File hashes

Hashes for txtmarker-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd51918940b565deb180f09124b4c3147b9d0ba27b0d107457474e52d461c901
MD5 0f58f31a01108784cf3344ab5d769f12
BLAKE2b-256 3ecc58f15abf09d84ce0475126345787553dca7afb4ebd8b3c789487c5d500c5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page