Finds and highlights text in documents
Project description
txtmarker: Highlight text in documents
txtmarker highlights text in documents. txtmarker takes a list of (name, text) pairs, scan an input document and creates a modified version with highlights embedded.
Current file formats supported:
Installation
The easiest way to install is via pip and PyPI
pip install txtmarker
You can also install txtmarker directly from GitHub. Using a Python Virtual Environment is recommended.
pip install git+https://github.com/neuml/txtmarker
Python 3.6+ is supported
Examples
The examples directory has a series of examples and notebooks giving an overview of txtmarker. See the list of notebooks below.
Notebooks
Notebook | Description | |
---|---|---|
Introducing txtmarker | Overview of the functionality provided by txtmarker | |
Highlighting with Transformers | AI-driven highlighting with Transformers |
Configuration
The following section gives an overview of highlighters and available methods/configuration. See the notebooks above for detailed examples.
Create a new highlighter
from txtmarker.factory import Factory
highlighter = Factory.create("pdf")
extension
extension: string
Type of highlighter to create (i.e. pdf)
Optional constructor arguments:
formatter
formatter: callable
Formats queries and input text using this method. Helps with cleanup of files with lots of symbols and other content.
chunks
chunks: int
Splits queries into multiple chunks. This is designed for very long text matches.
Highlight text
highlighter.highlight("input.pdf", "output.pdf", [("name", "text to highlight")])
infile
infile: string
Full path to input file
outfile
outfile: string
Full path to output file, i.e. the highlighted file
highlights
highlights: list of (string, string|regex)
List of highlight elements. Each pair has a name (can be None) and text value. The text can either be a string or a regular expression.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file txtmarker-1.0.0.tar.gz
.
File metadata
- Download URL: txtmarker-1.0.0.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56e981f3bc8e54d26906473ae2a34c5119f14b6e9280d57271edab08f4f9a588 |
|
MD5 | e25a8f6e4fab0fcc5b00c9f37159dd47 |
|
BLAKE2b-256 | 71acd31c66d9ccce8176c675cdded950e644e8aafe6e633c90f917e686282d3c |
File details
Details for the file txtmarker-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: txtmarker-1.0.0-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd51918940b565deb180f09124b4c3147b9d0ba27b0d107457474e52d461c901 |
|
MD5 | 0f58f31a01108784cf3344ab5d769f12 |
|
BLAKE2b-256 | 3ecc58f15abf09d84ce0475126345787553dca7afb4ebd8b3c789487c5d500c5 |