DSL for building language rules
Project description
RITA DSL
This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into spaCy compatible patterns. These patterns can be used for doing manual NER as well as used in other processes, like retokenizing and pure matching
Links
Support
Install
pip install rita-dsl
Simple Rules example
rules = """
cuts = {"fitted", "wide-cut"}
lengths = {"short", "long", "calf-length", "knee-length"}
fabric_types = {"soft", "airy", "crinkled"}
fabrics = {"velour", "chiffon", "knit", "woven", "stretch"}
{IN_LIST(cuts)?, IN_LIST(lengths), WORD("dress")}->MARK("DRESS_TYPE")
{IN_LIST(lengths), IN_LIST(cuts), WORD("dress")}->MARK("DRESS_TYPE")
{IN_LIST(fabric_types)?, IN_LIST(fabrics)}->MARK("DRESS_FABRIC")
"""
Loading in spaCy
import spacy
from rita.shortcuts import setup_spacy
nlp = spacy.load("en")
setup_spacy(nlp, rules_string=rules)
And using it:
>>> r = nlp("She was wearing a short wide-cut dress")
>>> [{"label": e.label_, "text": e.text} for e in r.ents]
[{'label': 'DRESS_TYPE', 'text': 'short wide-cut dress'}]
Loading using Regex (standalone)
import rita
patterns = rita.compile_string(rules, use_engine="standalone")
And using it:
>>> list(patterns.execute("She was wearing a short wide-cut dress"))
[{'end': 38, 'label': 'DRESS_TYPE', 'start': 18, 'text': 'short wide-cut dress'}]
Changelog
0.4.0 (2020-01-25)
Features
- Support for deaccent. In general, if accented version of word is given, both deaccented and accented will be used to match. To turn iit off -
!CONFIG("deaccent", "N")
#38 - Added shortcuts module to simplify injecting into spaCy #42
Fix
- Fix issue regarding Spacy rules with
IN_LIST
and using case-sensitive mode. It was creating Regex pattern which is not valid spacy pattern #40
0.3.2 (2019-12-19)
Features
-
- Introduced
towncrier
to track changes - Added linter
flake8
- Refactored code to match
pep8
#32
- Introduced
Fix
-
-
Fix WORD split by
-
-
Split by
-
Coverage score increase #35
-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rita-dsl-0.4.4.tar.gz
(15.7 kB
view hashes)
Built Distribution
rita_dsl-0.4.4-py3-none-any.whl
(18.8 kB
view hashes)