Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

DSL for building language rules

Project description

RITA DSL

This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into spaCy compatible patterns. These patterns can be used for doing manual NER as well as used in other processes, like retokenizing and pure matching

Live Demo

Demo Page

Documentation

Quick Start

Install it via pip install rita-dsl

You can start defining rules by creating file with extention *.rita

Bellow is complete example which can be used as a reference point

cars = LOAD("examples/cars.txt") # Load items from file
colors = {"red", "green", "blue", "white", "black"} # Declare items inline

{IN_LIST(colors), WORD("car")} -> MARK("CAR_COLOR") # If first token is in list `colors` and second one is word `car`, label it

{IN_LIST(cars), WORD+} -> MARK("CAR_MODEL") # If first token is in list `cars` and follows by 1..N words, label it

{ENTITY("PERSON"), LEMMA("like"), WORD} -> MARK("LIKED_ACTION") # If first token is Person, followed by any word which has lemma `like`, label it

Now you can compile these rules rita -f <your-file>.rita output.jsonl

Using compiled rules

Standalone Version

While it is highly recommended to use it with spaCy as a base, there can be cases when pure python regex is the only option.

You can pass tree compilation function explicitly. This concrete function will build regular expressions and create executor which accepts raw text and returns list of results.

Here's a test covering this case

def test_standalone_simple():
    from rita.engine.translate_standalone import compile_tree
    patterns = rita.compile("examples/simple-match.rita", compile_fn=compile_tree)
    results = list(patterns.execute("Donald Trump was elected President in 2016 defeating Hilary Clinton."))
    assert len(results) == 2
    entities = list([(r["text"], r["label"]) for r in results])

    assert entities[0] == ("Donald Trump was elected", "WON_ELECTION")
    assert entities[1] == ("defeating Hilary Clinton", "LOST_ELECTION")

spaCy backedn

import spacy
from spacy.pipeline import EntityRuler

nlp = spacy.load("en")
ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.from_disk("output.jsonl")
nlp.add_pipe(ruler)

Everytime you'll parse text with spaCy, it will run usual workflow and apply these rules

text = """
Johny Silver was driving a red car. It was BMW X6 Mclass. Johny likes driving it very much.
"""

doc = nlp(text)

entities = [(e.text, e.label_) for e in doc.ents]
print(entities)

assert entities[0] == ("Johny Silver", "PERSON")  # Normal NER
assert entities[1] == ("red car", "CAR_COLOR")  # Our first rule
assert entities[2] == ("BMW X6 Mclass", "CAR_MODEL")  # Our second rule
assert entities[3] == ("Johny likes driving", "LIKED_ACTION")  # Our third rule

Alternativelly, if rita is used as a dependency in project and you prefer to compile rules dynamically, you can do:

import rita
import spacy
from spacy.pipeline import EntityRuler

nlp = spacy.load("en")
ruler = EntityRuler(nlp, overwrite_ents=True)

patterns = rita.compile("examples/color-car.rita")

ruler.add_patterns(patterns)
nlp.add_pipe(ruler)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for rita-dsl, version 0.2.0
Filename, size File type Python version Upload date Hashes
Filename, size rita_dsl-0.2.0-py3-none-any.whl (12.5 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size rita-dsl-0.2.0.tar.gz (10.4 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page