Skip to main content

Slim, flexible and extendable NLP engine that can produce list of features from text based on provided condtions.

Project description

prosecco

Description

Slim, flexible and extendable NLP engine that can produce list of features from text based on provided condtions.

Features

  • word categorisation
  • feature extraction

Install

pip install prosecco

Usage

python example.py
python example_basic.py

Examples

Basic

from prosecco import Prosecco, Condition

# Read wikipedia https://en.wikipedia.org/wiki/Superhero
with open('sample/superhero.txt') as f:
    text = f.read()

# 1. Create conditions based on super hero names
superheroes = ["batman", "spiderman", "superman", "captain marvel", "black panther"]
conditions = [Condition(lemma_type="hero", compare=hero, lower=True) for hero in superheroes]
# 2. Create prosecco
p = Prosecco(conditions=conditions)
# 3. Let's drink and print output
p.drink(text, progress=True)
lemmas = set(p.get_lemmas(type='hero'))
print(" ".join(map(str, lemmas)))

Output

Batman[hero] Black Panther[hero] Superman[hero] Captain Marvel[hero]

Advanced

from prosecco import *

text = """Chrząszcz brzmi w trzcinie w Szczebrzeszynie.
Ząb zupa zębowa, dąb zupa dębowa.
Gdzie Rzym, gdzie Krym. W Pacanowie kozy kują.
Tak, jeśli mam szczęśliwy być, to w Gdańsku muszę żyć! 
"""

# 1. Create conditions based on city names
cities = ["szczebrzeszyn", "pacanow", "gdansk", "rzym", "krym"]
conditions = []
for city in cities:
    conditions.append(Condition(lemma_type="city",
                                compare=city,
                                normalizer=CharsetNormalizer(Charset.PL_EN),
                                stemmer=WordStemmer(language="pl"),
                                lower=True))
# 2. Create tokenizer for polish charset
tokenizer = LanguageTokenizer(Charset.PL)
# 3. Get list of tokens
tokens = tokenizer.tokenize(text)
# 4. Create visitor with conditions provided in step 1
visitor = Visitor(conditions=conditions)
# 5. Parse tokens based on visitor conditions
lexer = Lexer(tokens=tokens, visitor=visitor)
# 6. Get list of lemmas
lemmas = lexer.lex()
# 7. filter found cities
found_cities = filter(lambda l: l.type == "city", lemmas)
# 8. print output
print(" ".join(map(str, found_cities)))

Output

Szczebrzeszynie[city] Rzym[city] Krym[city] Pacanowie[city] Gdańsku[city]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prosecco-0.0.3.tar.gz (2.3 kB view hashes)

Uploaded Source

Built Distribution

prosecco-0.0.3-py3-none-any.whl (3.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page