Skip to main content

Granularity scoring for natural language

Project description

PyPI version GitHub

Granuscore

Granuscore is a Python library for measuring the semantic granularity of natural language text.

It provides an end-to-end pipeline that:

  1. splits text into referential units,
  2. assigns continuous granularity scores to each unit,
  3. aggregates these scores into document-level estimates.

Granuscore is designed for analyzing how fine-grained or coarse-grained textual expressions are in applications such as question answering, educational dialogue, summarization, and scientific writing.


Installation

Install from PyPI:

pip install granuscore

Or install the latest development version locally:

git clone https://github.com/lukasellinger/granuscore.git
cd granuscore
pip install -e .

Optional development dependencies:

pip install -e ".[dev]"

Quick Start

from granuscore import GranuScore

scorer = GranuScore()

text = """
Tony Hawk was born in San Diego.
"""

score = scorer(text)

print(score)

By default, Granuscore returns percentile scores, where higher values correspond to coarser-grained expressions.


Default Configuration

The default configuration reproduces the setup used in the paper.

scorer = GranuScore()

Equivalent to:

scorer = GranuScore(
    predictor_type="hit",
)

Default settings:

  • predictor_type="hit"
  • model_name="Hierarchy-Transformers/HiT-MiniLM-L12-WordNetNoun"
  • search_method="random_anchors"
  • random_anchors_k=999

Required artifacts such as:

  • FAISS indices,
  • anchor vectors,
  • LightGBM models,
  • and reference percentile distributions

are automatically downloaded and cached on first use.


Important Compatibility Note

The default configuration works out of the box and is the recommended setup.

If you customize components such as:

  • the embedding model,
  • search method,
  • FAISS index,
  • anchor vectors,
  • or LightGBM model,

you must ensure that all resources are compatible with each other.

For example, a LightGBM model trained using:

search_method="random_anchors"

should not be combined with:

search_method="nearest_neighbor"

Similarly, FAISS indices, anchor vectors, percentile reference distributions, and LightGBM models must originate from the same embedding space and training configuration.

Compatibility between custom resources is not validated automatically.


Notebook Tutorial

An interactive introduction is available in:

notebooks/getting_started.ipynb

Repository Structure

granuscore/
├── src/
│   └── granuscore/
│       ├── pipeline.py
│       ├── granularity_predictor.py
│       ├── claim_splitter.py
│       ├── bucket_output.py
│       ├── cache.py
│       └── artifacts.py
├── notebooks/
│   └── getting_started.ipynb
├── training_scripts/
├── evaluation/
├── assets/
├── data/ (needs to be externally downloaded)
├── pyproject.toml
├── LICENSE
└── README.md

Reproducing Paper Experiments

The datasets and precomputed resources required to reproduce the experiments from the paper are available here:

https://drive.google.com/drive/folders/1mJdUENOxHEiuYn-_f1KRQ1PZggXJDnb4?usp=sharing

Download the archive and extract it into the repository root:

unzip data.zip

This will create the expected directory structure used by the training and evaluation scripts.


Training Pipeline

Training uses precomputed .pkl feature files.

  1. Generate precomputed datasets:
training_scripts/build_precalc_data/
  1. Train LightGBM models:
python training_scripts/train_lgb_models.py

Citation

Updated Citation information will be added after publication.

@misc{ellinger2026granuscore,
  title={Granuscore: A Reference-Free Measure of Granularity for Text Analysis and Question Answering},
  author={Ellinger, Lukas and Fichtl, Alexander M. and Anschütz, Miriam and Groh, Georg},
  year={2026}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

granuscore-1.0.0.tar.gz (25.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

granuscore-1.0.0-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file granuscore-1.0.0.tar.gz.

File metadata

  • Download URL: granuscore-1.0.0.tar.gz
  • Upload date:
  • Size: 25.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for granuscore-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6d502d7b77ef2f86f53e732f622a0353891a024bd6cc3bb50774d989dfdc909b
MD5 d5ca24560a74b9af04a3f53939d32b25
BLAKE2b-256 d2f4a2ded96ff5945939b9f0ce0fdca519d17da44b3aa5b29da90f7a704af8f4

See more details on using hashes here.

File details

Details for the file granuscore-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: granuscore-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for granuscore-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 25ef461574424d0b5dc75b1d93e9b15c24dbc76960f883dc268411f6c22d9a93
MD5 ab630d1535e1e08317f00121ff633637
BLAKE2b-256 825ab9346dd18306893f41a3027cae2be19967750d4eb9b0fea58aca74ec8b22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page