Skip to main content

High Performance Text Processing & Segmentation Framework

Project description

Python Contributors Watchers Forks MIT License Stargazers

Pawpaw High Performance Text Processing & Segmentation Framework

Botanical Drawing: Asimina triloba: the American papaw

Pawpaw is a high performance text segmentation framework that allows you you to quickly create parsers whose outputs are tree graphs. The resulting trees can be serialized, traversed, and searched using a powerful structured query language.

  • Indexed str and substr representation
    • Efficient memory utilization
    • Fast processing
    • Pythonic relative indexing and slicing
    • Runtime & polymorphic value extraction
  • Rules Pipelining Engine
    • Develop complex lexical parsers with just a few lines of code
    • Quickly and easily convert unstructured text into structured, indexed, & searchable tree graphs
    • Pre-process text for downstream NLP/AI/ML consumers
  • Search and Query
    • Hierarchical data structure for all indexed text
    • Search using extensive structured query language
    • Optionally pre-compile queries for reuse to improvement performance
  • XML Processing
    • Features a drop-in replacement for ElementTree.XmlParser
    • Full text indexes for all Elements, Attributes, Tags, Text, etc.
    • Search XML using both XPATH and the included, structured query language
  • Efficient pickling and JSON persistance
    • Security option enables persistance of index-only data, with refrence strings re-injected during de-serialziation
  • Stable
    • Over 2,100 unit tests and counting!
Explore the docs   •   Report Bug   •   Request Feature

Getting Started

Prerequisites

Pawpaw has been written and tested using Python 3.10. The only dependency is regex, which will be fetched and installed automatically if you install Pawpaw with pip or conda.

Installation

  1. Install with pip from pypi:

    pip install pawpaw
    
  2. Install with pip from github:

    pip install git+https://github.com/rlayers/pawpaw.git
    
  3. Install with conda

    source activate myenv
    conda install git pip
    pip install git+https://github.com/rlayers/pawpaw.git
    
  4. Clone the repo with git

    git clone https://github.com/rlayers/pawpaw
    

Verify

Open and python prompt and type:

>>> from pawpaw import Ito
>>> Ito('Hello, World!')
Ito('Hello, World!', 0, 13, None)

If your last line looks like this, you are up and running with Pawpaw!

(back to top)

Contributing

Contributions to Pawpaw are greatly appreciated - please refer to the contributing guildelines for details.

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Contacts

Robert L. Ayers:  a.nov.guy@gmail.com

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pawpaw-1.0.0a4.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

pawpaw-1.0.0a4-py3-none-any.whl (35.2 kB view details)

Uploaded Python 3

File details

Details for the file pawpaw-1.0.0a4.tar.gz.

File metadata

  • Download URL: pawpaw-1.0.0a4.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for pawpaw-1.0.0a4.tar.gz
Algorithm Hash digest
SHA256 bf2957349c19da5385ff61da328d0e55054ba0bb9cf30d4a8fb1ceda45862c1f
MD5 98c2c181f8fc0aac659b77c6ef803179
BLAKE2b-256 72d0b1be26a71fd269826ce4a812d5e46680d1cf2d361650741d5a3a0bb057f8

See more details on using hashes here.

File details

Details for the file pawpaw-1.0.0a4-py3-none-any.whl.

File metadata

  • Download URL: pawpaw-1.0.0a4-py3-none-any.whl
  • Upload date:
  • Size: 35.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for pawpaw-1.0.0a4-py3-none-any.whl
Algorithm Hash digest
SHA256 81c69de3585eb465621ad27a6b34f1144bb4b10868adc609e3e5afb02fef3569
MD5 9b7f1cd76270386031bdb93c87b1d14a
BLAKE2b-256 484974072a3966517c3b1cfe21f674c13286970aa97389e10e589918c5b1841f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page