Skip to main content

Utility to more intuitively parse text

Project description

Text Walker

Overview

TextWalker allows for an intuitive way to parse unstructured text.

The TextWalker API emulates how a complex regular expression is iteratively constructed. Typically, when constructing a regex, I'll construct a part of it; test it and build the next part.

>>> text = """CREATE TABLE dbo.car_inventory
(
    cp_car_sk        integer               not null,
    cp_car_make_id   char(16)              not null,
)
WITH (OPTION (STATS = ON))"""

>>> from text_walker import TextWalker
>>> tw = TextWalker(text)

>>> tw.walk_many(['CREATE', 'TABLE'])
>>> tname_match = tw.walk('dbo.[a-z0-9_]+')
>>> tablename = tname_match.replace('dbo.', '')
>>> print(f'table name is {tablename}')

table name is car_inventory

>>> tw.walk('\(')

# now print column names
>>> cols_text, _ = tw.walk_until('WITH')
>>> for col_def in cols_text.split(','):
        col_name = col_def.strip().split(' ')[0]
        print(f'column name is: {}')

column name is cp_car_sk
column name is cp_car_make_id

Supported Grammar

# parse literal

Installation

git clone https://github.com/spandanb/textwalker.git
cd textwalker
python -m venv venv
pip install -r requirements.txt
python3 setup.py install

Run Tests

pytest

Steps

generate docs: cd pdoc --html --force textwalker

local install pip install -e .

TODO (MISC)

  • properly define the grammar supported
  • add licensce

TODO (TECHNICAL)

  • add support for '{}'

  • add support for case (in)sensitive match?

  • add docs

  • cleanup docstrings

  • add tests -- split tests by different grammar being exercized

  • fix setup -- ideally I run setup.py; and tests and examples can then just run

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textwalker-0.1.0.tar.gz (2.4 kB view hashes)

Uploaded Source

Built Distribution

textwalker-0.1.0-py3-none-any.whl (2.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page