Skip to main content

Regular Expressions turbo-charged with notations for part-of-speech and dependency tree tags

Project description

Natural Language Expressions for Python

NatEx: Regular Expressions turbo-charged with notations for part-of-speech and dependency tree tags

In a Nutshell

from natex import natex

sentence = natex('Sloths eat steak in New York')

# check if string begins with noun:
sentence.match(r'@NOUN')
# returns <natex.Match object; span=(0, 6), match='Sloths'>

# find first occurrence of an adposition followed by a proper noun
sentence.search(r'@ADP <@PROPN>')  	
# returns <natex.Match object; span=(17, 28), match='in New York'>

# find all occurrences of nouns or proper nouns
sentence.findall(r'@(NOUN|PROPN)') 	
# returns ['Sloths', 'steak', 'New York']

# find all occurrences of nouns or proper nouns starting with an s (irregardless of casing)
sentence.findall(r's[^@]+@(NOUN|PROPN)', natex.I)
# returns ['Sloths', 'steak']

Goals & Design

The goal of NatEx is quick and simple parsing of tokens using their literal representation, part-of-speech, and dependency tree tags. Think of it as an extension of regular expressions for natural language processing. The generated part-of-speech and dependency tree tags are provided by stanza and merged into a string that can be searched through.

Why not Tregex, Semgrex, or Tsurgeon?

NatEx was designed primarily with simplicity in mind. Libraries like Tregex, Semgrex, or Tsurgeon may be able to match more complex patterns, but they have a steep learning curve and the patterns are hard to read. Plus NatEx is written for Python. It wraps the built-in re package with an abstraction layer and thus provides almost the same performance as any normal regex.

Examples

You can use it for simple tagging (NLU):

from natex import natex

sentence = natex('book flights from Munich to Chicago')
origin_location, destination_location = sentence.findall('<@PROPN>')
# origin_location ='Munich', destination_location = 'Chicago'

sentence = natex('I am being called Dan Borufka')
firstname, lastname = sentence.findall('<@PROPN>')
# firstname = 'Dan', lastname = 'Borufka'

sentence = natex('I need to go to Italy')
clause = sentence.search('<@ADP> <@PROPN>').match
# clause = 'to Italy'
destination = clause.split(' ')[1]

Or for simple response template generation (NLG):

from natex import natex

sentence = natex('Eat my shorts')

# look for token with imperative form
is_command = sentence.match(r'<!>')

if is_command:
	action_verb = sentence.search(r'<@VERB!>').lower()
	action_recipient = sentence.search(r'<#OBJ>')
	response = f'I will do my best to {action_verb} {action_recipient}!'

	# will contain 'I will do my best to eat shorts!'

Even more (random) examples:

from natex import natex

sentence = natex('Sloths eat steak in New York')

# find first occurrence of character sequence "ea" in nouns only
sentence.search(r'ea@NOUN')			
# returns <natex.Match object; span=(11, 16), match='steak'>

# find first occurrence of character sequence "ea"
sentence.search(r'ea')
# returns <natex.Match object; span=(7, 9), match='ea'>

# find all occurrences of nouns or proper nouns starting with a lowercase s
sentence.findall(r's[^@]+@(NOUN|PROPN)') 
# returns ['steak']

sentence = natex('Ein Hund isst keinen Gurkensalat in New York.', 'de')

# replace the nominal subject with the literal 'Affe'
sentence.sub(r'#NSUBJ', 'Affe')
# returns 'Ein Affe isst keinen Gurkensalat in New York.'

Check out test.py for some more examples!

Installation

Run:

pip install natex

By default, NatEx only installs the English models for stanza. Use the following command to download a model for another language:

python -m natex setup <language_code>`

e.g. for French use:

python -m natex setup fr`

Visit https://github.com/secretsauceai/natex-py for a full list of supported language codes.

Usage

NatEx provides the same API as the re package, but adds the following special characters:

Symbol Meaning Example pattern Meaning
< token boundary (opening) <New Find tokens starting with "New"
: either @ or # <:ADV Find tokens with e.g. universal POS "ADV" or dep. tree tag "ADVMOD"
@ part of speech tag @ADJ Find tokens that are adjectives
# dependency tree tag #OBJ Find the objects of the sentence
! imperative (mood) <!> Find any verbs that are in imperative form
> token boundary (closing) York> Find all tokens ending in "York"

If you combine features (e.g. seeking by part of speech and dependency tree simultaneously) make sure you provide them in the order of the table above.

✔ This will work:

natex('There goes a test sentence').findall(r'<@NOUN#OBJ>')

✘ But this won't:

natex('There goes a test sentence').findall(r'<#OBJ@NOUN>')

Calling the natex() function returns a NatEx instance. See API for a more detailed description. Just as the re.Match returning methods provided by Python's built-in re package, NatEx' equivalents will return a natex.Match object containing a span and a match property referring to position and substring of the sentence respectively.

Configuration

You can set the processing language of NatEx using the second parameter language_code (defaults to 'en'). It accepts a two-letter language-code, supporting all languages officially supported by stanza.

sentence = natex('Das Faultier isst keinen Gurkensalat', 'de')

When you run NatEx for the first time, it will check for the existence of the corresponding language models and download them if necessary. All subsequent calls to natex() will exclude that step.

API

The API is derived from Python's built-in re package:

NatEx

.match(pattern, flags)

Checks (from the beginning of the string) whether the sentence matches a pattern and returns a natex.Match object or None otherwise.

.search(pattern, flags) Returns a natex.Match object describing the first substring matching pattern.

.findall(pattern, flags) Returns all found strings matching pattern as a list.

.split(pattern, flags) Splits the sentence by all occurrences of the found pattern and returns a list of strings.

.sub(pattern, replacement, flags) Replaces all occurrences of the found pattern by replacement and returns the changed string.

Testing

You can use pytest in your terminal (simply type in pytest) to run the unit tests shipped with this package. Install it by running pip install pytest in your terminal.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

natex-1.0.8.tar.gz (9.4 kB view details)

Uploaded Source

Built Distributions

natex-1.0.8-py3.7.egg (15.7 kB view details)

Uploaded Source

natex-1.0.8-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file natex-1.0.8.tar.gz.

File metadata

  • Download URL: natex-1.0.8.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.6

File hashes

Hashes for natex-1.0.8.tar.gz
Algorithm Hash digest
SHA256 165fa1e8a73eab50addab459acd6a7dedbc5e555ebb4643e3771142147b30997
MD5 fe896240672abce95d8133a9f621306a
BLAKE2b-256 bee38b8932f573c366d085b60e3ae4ba4bf547dcd57fdd3488fef21bdada307e

See more details on using hashes here.

File details

Details for the file natex-1.0.8-py3.7.egg.

File metadata

  • Download URL: natex-1.0.8-py3.7.egg
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.6

File hashes

Hashes for natex-1.0.8-py3.7.egg
Algorithm Hash digest
SHA256 8782658ef75506e287103ce3c8da9f9cc55fea95ba44c22d3651c6f98bf323b6
MD5 a5ae015561e05c6c8ffb37779274e11d
BLAKE2b-256 d7174b12ddb74c8f35262d3b02af1be90ba242a63c550cc30817250bd2f5580e

See more details on using hashes here.

File details

Details for the file natex-1.0.8-py3-none-any.whl.

File metadata

  • Download URL: natex-1.0.8-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.6

File hashes

Hashes for natex-1.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 f79a1cc5dfcbb63192720fd7327867214f3c543c9df0f9fe0451807622adbd75
MD5 274b0f607fafe2dee01cf6afacdc2a4e
BLAKE2b-256 013633160d84dc4afbb8306f5cb92bcb857c7ee6418abbd7cc0daffd84105531

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page