Utility to more intuitively parse text
Project description
Text Walker
Overview
TextWalker
allows for an intuitive way to parse unstructured text.
The TextWalker
API emulates how a complex regular expression is iteratively constructed.
Typically, when constructing a regex, I'll construct a part of it; test it and build the next part.
>>> text = """CREATE TABLE dbo.car_inventory
(
cp_car_sk integer not null,
cp_car_make_id char(16) not null,
)
WITH (OPTION (STATS = ON))"""
>>> from text_walker import TextWalker
>>> tw = TextWalker(text)
>>> tw.walk_many(['CREATE', 'TABLE'])
>>> tname_match = tw.walk('dbo.[a-z0-9_]+')
>>> tablename = tname_match.replace('dbo.', '')
>>> print(f'table name is {tablename}')
table name is car_inventory
>>> tw.walk('\(')
# now print column names
>>> cols_text, _ = tw.walk_until('WITH')
>>> for col_def in cols_text.split(','):
col_name = col_def.strip().split(' ')[0]
print(f'column name is: {}')
column name is cp_car_sk
column name is cp_car_make_id
Supported Grammar
# parse literal
Installation
git clone https://github.com/spandanb/textwalker.git
cd textwalker
python -m venv venv
pip install -r requirements.txt
python3 setup.py install
Run Tests
pytest
Steps
generate docs: cd pdoc --html --force textwalker
local install pip install -e .
TODO (MISC)
- properly define the grammar supported
- add licensce
TODO (TECHNICAL)
-
add support for '{}'
-
add support for case (in)sensitive match?
-
add docs
-
cleanup docstrings
-
add tests -- split tests by different grammar being exercized
-
fix setup -- ideally I run setup.py; and tests and examples can then just run
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
textwalker-0.1.0.tar.gz
(2.4 kB
view hashes)
Built Distribution
Close
Hashes for textwalker-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29dc5513f3cc860dc40f9d64147f41146ff08a1a5053bf574aa87f5a500f5024 |
|
MD5 | a3bb957137dd5182aa98f7cf722c12ef |
|
BLAKE2b-256 | 422fec923968555a80afb68c991d74438ca7d1ac9ad76ed9a201decbb4045646 |