High Performance Text Processing & Segmentation Framework
Project description
Pawpaw
Pawpaw is a high performance text segmentation framework that allows you you to quickly create parsers whose outputs are tree graphs. The resulting trees can be serialized, traversed, and searched using a powerful structured query language.
- Indexed str and substr representation
- Efficient memory utilization
- Fast processing
- Pythonic relative indexing and slicing
- Runtime & polymorphic value extraction
- Rules Pipelining Engine
- Develop complex lexical parsers with just a few lines of code
- Quickly and easily convert unstructured text into structured, indexed, & searchable tree graphs
- Pre-process text for downstream NLP/AI/ML consumers
- Search and Query
- Hierarchical data structure for all indexed text
- Search using extensive structured query language
- Optionally pre-compile queries for reuse to improvement performance
- XML Processing
- Features a drop-in replacement for ElementTree.XmlParser
- Full text indexes for all Elements, Attributes, Tags, Text, etc.
- Search XML using both XPATH and the included, structured query language
- Efficient pickling and JSON persistance
- Security option enables persistance of index-only data, with refrence strings re-injected during de-serialziation
- Stable
- Over 2,100 unit tests and counting!
Getting Started
Prerequisites
Pawpaw has been written and tested using Python 3.10. The only dependency is
regex
, which will be fetched and installed automatically if you install Pawpaw
with pip or conda.
Installation
-
Install with pip from pypi:
pip install pawpaw
-
Install with pip from github:
pip install git+https://github.com/rlayers/pawpaw.git
-
Install with conda
source activate myenv conda install git pip pip install git+https://github.com/rlayers/pawpaw.git
-
Clone the repo with git
git clone https://github.com/rlayers/pawpaw
Verify
Open and python prompt and type:
>>> from pawpaw import Ito
>>> Ito('Hello, World!')
Ito('Hello, World!', 0, 13, None)
If your last line looks like this, you are up and running with Pawpaw!
Contributing
Contributions to Pawpaw are greatly appreciated - please refer to the contributing guildelines for details.
License
Distributed under the MIT License. See LICENSE for more information.
Contacts
Robert L. Ayers: a.nov.guy@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pawpaw-1.0.0a4.tar.gz
.
File metadata
- Download URL: pawpaw-1.0.0a4.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf2957349c19da5385ff61da328d0e55054ba0bb9cf30d4a8fb1ceda45862c1f |
|
MD5 | 98c2c181f8fc0aac659b77c6ef803179 |
|
BLAKE2b-256 | 72d0b1be26a71fd269826ce4a812d5e46680d1cf2d361650741d5a3a0bb057f8 |
File details
Details for the file pawpaw-1.0.0a4-py3-none-any.whl
.
File metadata
- Download URL: pawpaw-1.0.0a4-py3-none-any.whl
- Upload date:
- Size: 35.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81c69de3585eb465621ad27a6b34f1144bb4b10868adc609e3e5afb02fef3569 |
|
MD5 | 9b7f1cd76270386031bdb93c87b1d14a |
|
BLAKE2b-256 | 484974072a3966517c3b1cfe21f674c13286970aa97389e10e589918c5b1841f |