Skip to main content

Light Text Pre-processing permits to apply a chain of built-in regex rules to a input string.

Project description

Light Text Pre-processing

Light Text Pre-processing is an easy-to-use python module that permits to apply a chain of built-in regex rules to a input string. Regex rules are stored in a separate YML file and compiled at run-time. The compiling mechanism and how to add a custom regex are described below.

ci/cd

How it works

Package reads a list of regex from light_text_prepro/rules/regex.yml. Each row in regex.yml identifies a regex rule such as user_tag: '"@[0-9a-z](\.?[0-9a-z])*"'. In this row, user_tag is the key of the regex, whereas the '"@[0-9a-z](\.?[0-9a-z])*"'is its value.

At run-time, the package reads the regex.yml and compiles a method for each regex, the method is named as the the key of the row. For example, at the end of the process, you will be able to call the user_tag()method, that permit to match the user tagged. Each method has the optional parameter replace_with that allow you to replace the string matched by regex rule with an arbitrary text.

Package installation

List of Regex

user_tag: '"(?<![\w@])@([\w@]+(?:[.!][\w@]+)*)"'
email: '"([^@|\s]+@[^@]+\.[^@|\s]+)"'
url: '"(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})"'
punctuation: '"[-!`?,.\":;]"'
parentheses: '"[\[\]{}()]"'
special_chars: '"[$%^&*_+|~=<>:;\\]"'
ip_address: '"(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$"'
html_tag: '"^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$"'
tab_new_line: '"(\n|\t|\r)"'
multiple_space: '"[ ]+"'
emoji: '"[^\u1F600-\u1F6FF\s]"'

If you are happy wiht the list above, you can install the package via pip.

pip install light-text-prepro

How to use

from light_text_prepro.lprepro import LPrePro
...
obj = LPrePro()
...
result = obj.set_text('Hey @username, this is my email my@email.com') \
		 .user_tag(replace_with='[user]') \
		 .email(replace_with='[email]') \
    	.get_text()
# result -> Hey [user], this is my email [email]

Otherwise, if you want to contribute to enrich the package adding your regex rule, please follow section below.

How to add a regex rules

Setup project

$> git clone https://github.com/Arfius/light-text-prepro.git
$> cd light-text-prepro
$> pip install poetry flake8
$> poetry install

Add new regex

  1. Open light_text_prepro/rules/regex.yml and add a new row. Make sure to use a unique key for the rule. If you get issue adding the regex rule, use any online regex validation tool and export the regex rule for python. (i.e. https://regex101.com/ => FLAVOR python => Copy to clipboard )
  2. Add a unit tests under the tests folder and make all test passed. Use$> poetry run pytest to run unit tests.
  3. Update the section List of Regex at the end of this file.
  4. Create a Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

light-text-prepro-0.3.5.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

light_text_prepro-0.3.5-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file light-text-prepro-0.3.5.tar.gz.

File metadata

  • Download URL: light-text-prepro-0.3.5.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.0

File hashes

Hashes for light-text-prepro-0.3.5.tar.gz
Algorithm Hash digest
SHA256 01cdea4c1225dd963ad99efe12bab66c095fc747621b7b5754c5389dc56de378
MD5 92e3003cb5f655564e23f32525e1f622
BLAKE2b-256 9fe12bd6e741f8237a0844ceec2bce5d73b7aa425f36f4b2ab35ca024abd0ba7

See more details on using hashes here.

File details

Details for the file light_text_prepro-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: light_text_prepro-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.0

File hashes

Hashes for light_text_prepro-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2db93a1c5fb7cdec7013501aadcac077d7025d60a0476988da5b86370e44fdc1
MD5 bc57f970278e7d9d9c48c84fbd4de1f9
BLAKE2b-256 ab7ed9881e891b0b5771008f1be4fdfc7ef8fe86214f2dcb66b912a4443dc527

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page