Slim, flexible and extendable NLP engine that can produce list of features from text based on provided condtions.
Project description
prosecco
Description
Slim, flexible and extendable NLP engine that can produce list of features from text based on provided condtions.
Features
- word categorisation
- feature extraction
Download
pip install prosecco
Usage
python example.py
or
text = """Chrząszcz brzmi w trzcinie w Szczebrzeszynie.
Ząb zupa zębowa, dąb zupa dębowa.
Gdzie Rzym, gdzie Krym. W Pacanowie kozy kują.
Tak, jeśli mam szczęśliwy być, to w Gdańsku muszę żyć!
"""
# 1. Create conditions based on city names
cities = ["szczebrzeszyn", "pacanow", "gdansk", "rzym", "krym"]
conditions = []
for city in cities:
conditions.append(Condition(lemma_type="city",
compare=city,
normalizer=CharsetNormalizer(Charset.PL_EN),
stemmer=WordStemmer(language="pl"),
lower=True))
# accept all words
conditions.append(Condition(compare=r".*"))
# 2. Create tokenizer for polish charset
tokenizer = LanguageTokenizer(Charset.PL)
# 3. Get list of tokens
tokens = tokenizer.tokenize(text)
# 4. Create visitor with conditions provided in step 1
visitor = Visitor(conditions=conditions)
# 5. Parse tokens based on visitor conditions
lexer = Lexer(tokens=tokens, visitor=visitor)
# 6. Get list of lemmas
lemmas = lexer.lex()
# 7. filter found cities
found_cities = filter(lambda l: l.type == 'city', lemmas)
# 8. print output
print(" ".join(map(str, found_cities)))
Output
Szczebrzeszynie[city] Rzym[city] Krym[city] Pacanowie[city] Gdańsku[city]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
prosecco-0.0.2.tar.gz
(2.0 kB
view hashes)
Built Distributions
prosecco-0.0.2-py3.7.egg
(1.8 kB
view hashes)
prosecco-0.0.2-py2.7.egg
(1.8 kB
view hashes)