Have you every struggled with needing a Spacy TextCategorizer but didn't have the time to train one from scratch? Classy Classification is the way to go!
Project description
Classy Classification
Have you every struggled with needing a Spacy TextCategorizer but didn't have the time to train one from scratch? Classy Classification is the way to go! For few-shot classification using sentence-transformers or spaCy models, provide a dictionary with labels and examples, or just provide a list of labels for zero shot-classification with Hugginface zero-shot classifiers.
Install
pip install classy-classification
Quickstart
SpaCy embeddings
import spacy
import classy_classification
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
nlp = spacy.load("en_core_web_md")
nlp.add_pipe(
"text_categorizer",
config={
"data": data,
"model": "spacy"
}
)
print(nlp("I am looking for kitchen appliances.")._.cats)
# Output:
#
# [{"label": "furniture", "score": 0.21}, {"label": "kitchen", "score": 0.79}]
Sentence-transfomer embeddings
import spacy
import classy_classification
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
nlp = spacy.blank("en")
nlp.add_pipe(
"text_categorizer",
config={
"data": data,
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"device": "gpu"
}
)
print(nlp("I am looking for kitchen appliances.")._.cats)
# Output:
#
# [{"label": "furniture", "score": 0.21}, {"label": "kitchen", "score": 0.79}]
Hugginface zero-shot classifiers
import spacy
import classy_classification
data = ["furniture", "kitchen"]
nlp = spacy.blank("en")
nlp.add_pipe(
"text_categorizer",
config={
"data": data,
"model": "facebook/bart-large-mnli",
"cat_type": "zero",
"device": "gpu"
}
)
print(nlp("I am looking for kitchen appliances.")._.cats)
# Output:
#
# [{"label": "furniture", "score": 0.21}, {"label": "kitchen", "score": 0.79}]
Credits
Inspiration Drawn From
Huggingface does offer some nice models for few/zero-shot classification, but these are not tailored to multi-lingual approaches. Rasa NLU has a nice approach for this, but its too embedded in their codebase for easy usage outside of Rasa/chatbots. Additionally, it made sense to integrate sentence-transformers and Hugginface zero-shot, instead of default word embeddings. Finally, I decided to integrate with Spacy, since training a custom Spacy TextCategorizer seems like a lot of hassle if you want something quick and dirty.
Or buy me a coffee
Standalone usage without spaCy
from classy_classification import classyClassifier
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
classifier = classyClassifier(data=data)
classifier("I am looking for kitchen appliances.")
classifier.pipe(["I am looking for kitchen appliances."])
# overwrite training data
classifier.set_training_data(data=data)
classifier("I am looking for kitchen appliances.")
# overwrite [embedding model](https://www.sbert.net/docs/pretrained_models.html)
classifier.set_embedding_model(model="paraphrase-MiniLM-L3-v2")
classifier("I am looking for kitchen appliances.")
# overwrite SVC config
classifier.set_svc(
config={
"C": [1, 2, 5, 10, 20, 100],
"kernels": ["linear"],
"max_cross_validation_folds": 5
}
)
classifier("I am looking for kitchen appliances.")
Todo
[ ] look into a way to integrate spacy trf models.
[ ] multiple clasifications datasets for a single input e.g. emotions and topic.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for classy-classification-0.3.6.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09567d05aca3b90a45ee8dea55994fa9dabf150c4a2acfee86d88ec8bd2f9603 |
|
MD5 | 06b1d1f1886a3fabd129f8d123057443 |
|
BLAKE2b-256 | b65e8879de13c21fc292ff1af5702eb92c8bba69b56d6769afef68897fd55a0c |
Hashes for classy_classification-0.3.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f15a0cb69a32c8269d0176dab3b5fee43b504a7ba19c0f06079e4cfa9f93e778 |
|
MD5 | 01905bf92cc23f878ff6ebfe9360fd47 |
|
BLAKE2b-256 | 2a02a2da70ba229107a7b88638a1bc82759186a34a18f2abea6bf215d0c0b11a |