A Python library for generating discrete paragraph labels from concept extraction, graph communities, and interpretable assignment rules, with heuristic, spaCy, and provider-backed LLM extraction modes.

These details have not been verified by PyPI

Project links

Project description

paralabelgen

paralabelgen is a Python library for generating discrete paragraph labels from concept extraction, graph communities, and interpretable assignment rules.

PyPI distribution: paralabelgen
Python import package: labelgen
Repository: https://github.com/HuRuilizhen/labelgen

Install

pip install paralabelgen

If you want to use the default spaCy extractor, install a compatible English pipeline such as:

python -m spacy download en_core_web_sm

en_core_web_sm is the recommended default model, but you can point spacy_model_name at another installed compatible spaCy pipeline.

Quick Start

Default spaCy pipeline

from labelgen import LabelGenerator, LabelGeneratorConfig

paragraphs = [
    "OpenAI builds language models for developers.",
    "Developers use language models in production systems.",
]

generator = LabelGenerator(LabelGeneratorConfig())
result = generator.fit_transform(paragraphs)

for concept in result.concepts:
    print(concept.normalized, concept.kind, concept.document_frequency, sep=" | ")

for assignment in result.paragraph_labels:
    print(assignment.paragraph_id, assignment.label_ids, assignment.label_scores)

LLM extraction pipeline

from labelgen import LabelGenerator, LabelGeneratorConfig

config = LabelGeneratorConfig(
    extractor_mode="llm",
    use_graph_community_detection=False,
)
config.extraction.llm.provider = "openai"
config.extraction.llm.model = "gpt-5-mini"

generator = LabelGenerator(config)
result = generator.fit_transform(
    [
        "OpenAI builds language models and developer APIs for production systems.",
        "Production systems need monitoring and evaluation tooling.",
    ]
)

The LLM extractor supports openai, mistral, and qwen style providers. Set the corresponding API key in the expected environment variable:

OPENAI_API_KEY
MISTRAL_API_KEY
DASHSCOPE_API_KEY

Extraction Modes

LabelGeneratorConfig.extractor_mode supports three modes:

spacy: default public extractor using spaCy noun chunks and entities
heuristic: deterministic fallback extractor using rule-based spans
llm: provider-backed concept extraction using structured JSON output

If extractor_mode is unset, the legacy use_nlp_extractor compatibility flag is still respected. New code should prefer extractor_mode.

LLM Configuration Notes

The LLM extraction path is opt-in and synchronous. Key settings live under config.extraction.llm:

provider
model
api_key_env_var
base_url
temperature
max_output_tokens
batch_size
max_concepts_per_paragraph
cache_enabled
cache_dir
record_extraction_artifacts
artifact_dir
prompt_version
prompt_template

Cache and artifact behavior:

cache_enabled=True stores parsed concept lists on disk and avoids repeated provider calls for the same effective request
record_extraction_artifacts=True writes structured per-batch extraction artifacts for audit and experiment analysis
both are optional and can be disabled independently

Public API

The main public entrypoints are:

LabelGenerator
LabelGeneratorConfig
Paragraph, Concept, ConceptMention, Community, ParagraphLabels
dump_result() and load_result()

Detailed API notes are available in docs/public_api.md.

Examples

Runnable examples are available in examples/:

Configuration Notes

fit() learns concepts and communities from a corpus
transform() applies previously learned communities to new paragraphs
fit_transform() learns and labels the same input in one pass
use_graph_community_detection=True uses Leiden community detection
use_graph_community_detection=False uses deterministic connected components
the default spaCy path requires the configured spaCy model to be installed
the LLM path does not silently fall back to spaCy or heuristic extraction

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.3

Apr 18, 2026

0.2.2

Apr 14, 2026

0.2.1

Mar 31, 2026

This version

0.2.0

Mar 25, 2026

0.1.1

Mar 24, 2026

0.1.0

Mar 23, 2026

0.0.0

Mar 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paralabelgen-0.2.0.tar.gz (48.4 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paralabelgen-0.2.0-py3-none-any.whl (37.1 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file paralabelgen-0.2.0.tar.gz.

File metadata

Download URL: paralabelgen-0.2.0.tar.gz
Upload date: Mar 25, 2026
Size: 48.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for paralabelgen-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d053b171536eb64593906a1a63fe75851b5fe39f739691094aa988fd46cd546b`
MD5	`977cefe5c557d7b3cff33f1d89d85cfd`
BLAKE2b-256	`288dd9dcd43c8928d7fc7bed5fed43ff9293a30c31cb63e8d79216bb9509946b`

See more details on using hashes here.

File details

Details for the file paralabelgen-0.2.0-py3-none-any.whl.

File metadata

Download URL: paralabelgen-0.2.0-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 37.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for paralabelgen-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ff66d4affae1fb96dd69ff5784b3b4a24037c807cef6cb61b3fc68dd390a7c99`
MD5	`c434b71ad3ff7a62394e3ee0169b6db2`
BLAKE2b-256	`46f1aca14418e8362c944ffd05122640c480d6c8b724c6752c8aa63272af4e55`

See more details on using hashes here.

paralabelgen 0.2.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

paralabelgen

Install

Quick Start

Default spaCy pipeline

LLM extraction pipeline

Extraction Modes

LLM Configuration Notes

Public API

Examples

Configuration Notes

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes