A Python library for generating discrete paragraph labels from concept extraction, graph communities, and interpretable assignment rules.
Project description
paralabelgen
paralabelgen is a Python library for generating discrete multi-label annotations for text paragraphs.
Install
pip install paralabelgen
Optional extras:
pip install "paralabelgen[graph]"
pip install "paralabelgen[nlp]"
Example
from labelgen import LabelGenerator, LabelGeneratorConfig
paragraphs = [
"OpenAI builds language models for developers.",
"Developers use language models in production systems.",
]
generator = LabelGenerator(LabelGeneratorConfig())
result = generator.fit_transform(paragraphs)
print("Concepts:")
for concept in result.concepts:
print(concept.normalized, concept.kind, concept.document_frequency)
print("Labels:")
for assignment in result.paragraph_labels:
print(assignment.paragraph_id, assignment.label_ids, assignment.label_scores)
Notes
- The distribution name is
paralabelgen, while the Python import package islabelgen. fitlearns concept communities from a corpus.transformapplies previously learned communities to new paragraphs.fit_transformlearns and labels the same input in one pass.- The base package works with deterministic fallback implementations.
- Without
paralabelgen[nlp], concept extraction uses regex and heuristic rules: capitalized spans are treated as lightweight entities, and non-stopword token spans are treated as candidate noun phrases. - Without
paralabelgen[graph], community detection falls back to deterministic connected components over the concept co-occurrence graph instead of Leiden. - Install
paralabelgen[nlp]to enable spaCy-based concept extraction. - Install
paralabelgen[graph]to enable Leiden community detection.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
paralabelgen-0.0.0.tar.gz
(23.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paralabelgen-0.0.0.tar.gz.
File metadata
- Download URL: paralabelgen-0.0.0.tar.gz
- Upload date:
- Size: 23.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
874117fa7aa3755bb66176aee047c48cd18606bb21c8d4b41835b3398288b681
|
|
| MD5 |
758f161ca94de769c8120a131a147a1e
|
|
| BLAKE2b-256 |
ecca0806fb11f500c3ae2e7cbf328f1d01953239651718ac4764ee6e45d3039c
|
File details
Details for the file paralabelgen-0.0.0-py3-none-any.whl.
File metadata
- Download URL: paralabelgen-0.0.0-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fb967846f1c32ab9dda09b983071932a9295580cbaa3906c8be4196642dbf53
|
|
| MD5 |
878959ac227a980f92ca90515fec8b85
|
|
| BLAKE2b-256 |
9b0621ad7e432cde55570f1b9500602495f98f201462c383857f3a6cee069bb5
|