Have you every struggled with needing a Spacy TextCategorizer but didn't have the time to train one from scratch? Classy Classification is the way to go! For few-shot classification using sentence-transformers or spaCy models, provide a dictionary with labels and examples, or just provide a list of labels for zero shot-classification with Hugginface zero-shot classifiers.
Project description
Classy Classification
Have you every struggled with needing a Spacy TextCategorizer but didn't have the time to train one from scratch? Classy Classification is the way to go! For few-shot classification using sentence-transformers or spaCy models, provide a dictionary with labels and examples, or just provide a list of labels for zero shot-classification with Hugginface zero-shot classifiers.
Install
pip install classy-classification
Quickstart
SpaCy embeddings
import spacy
import classy_classification
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
nlp = spacy.load("en_core_web_md")
nlp.add_pipe(
"text_categorizer",
config={
"data": data,
"model": "spacy"
}
)
print(nlp("I am looking for kitchen appliances.")._.cats)
# Output:
#
# [{"label": "furniture", "score": 0.21}, {"label": "kitchen", "score": 0.79}]
Sentence-transfomer embeddings
import spacy
import classy_classification
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
nlp = spacy.blank("en")
nlp.add_pipe(
"text_categorizer",
config={
"data": data,
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
}
)
print(nlp("I am looking for kitchen appliances.")._.cats)
# Output:
#
# [{"label": "furniture", "score": 0.21}, {"label": "kitchen", "score": 0.79}]
Hugginface zero-shot classifiers
import spacy
import classy_classification
data = ["furniture", "kitchen"]
nlp = spacy.blank("en")
nlp.add_pipe(
"text_categorizer",
config={
"data": data,
"model": "facebook/bart-large-mnli",
"cat_type": "zero"
}
)
print(nlp("I am looking for kitchen appliances.")._.cats)
# Output:
#
# [{"label": "furniture", "score": 0.21}, {"label": "kitchen", "score": 0.79}]
Credits
Inspiration Drawn From
Huggingface does offer some nice models for few/zero-shot classification, but these are not tailored to multi-lingual approaches. Rasa NLU has a nice approach for this, but its too embedded in their codebase for easy usage outside of Rasa/chatbots. Additionally, it made sense to integrate sentence-transformers and Hugginface zero-shot, instead of default word embeddings. Finally, I decided to integrate with Spacy, since training a custom Spacy TextCategorizer seems like a lot of hassle if you want something quick and dirty.
Or buy me a coffee
Standalone usage without spaCy
from classy_classification import classyClassifier
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
classifier = classyClassifier(data=data)
classifier("I am looking for kitchen appliances.")
classifier.pipe(["I am looking for kitchen appliances."])
# overwrite training data
classifier.set_training_data(data=data)
classifier("I am looking for kitchen appliances.")
# overwrite [embedding model](https://www.sbert.net/docs/pretrained_models.html)
classifier.set_embedding_model(model="paraphrase-MiniLM-L3-v2")
classifier("I am looking for kitchen appliances.")
# overwrite SVC config
classifier.set_svc(
config={
"C": [1, 2, 5, 10, 20, 100],
"kernels": ["linear"],
"max_cross_validation_folds": 5
}
)
classifier("I am looking for kitchen appliances.")
Todo
[ ] look into a way to integrate spacy trf models.
[ ] multiple clasifications datasets for a single input e.g. emotions and topic.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for classy-classification-0.3.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf8b75b011d94554cf1cb3140e7d128f910d619a56b3583f430da929d708229f |
|
MD5 | 568fba32f9119caa5f22c51a4d0ed790 |
|
BLAKE2b-256 | cc6370d283d540a1b8ba6ada75948fd7f1b8724c0108a29d4e58066a41a7459e |
Hashes for classy_classification-0.3.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1145f9501d933d9c6f017cb9c7f1b081f94e7e7091d74ec513b26a8b081d081c |
|
MD5 | d17044899f1b94cd7b38542832c263fa |
|
BLAKE2b-256 | cf051115e4b828830f37548e9117ba2c011904083385619bb0c952f12ff12f74 |