Skip to main content

SPAR: Semantic Projection with Active Retrieval

Project description

SPAR: Semantic Projection with Active Retrieval

SPAR scores short text on bipolar concepts you define as positive_seeds - negative_seeds. No model training or fine-tuning required.

Reference: Yan, Bei, Feng Mai, Chaojiang Wu, Rui Chen, and Xiaolin Li (2024). "A Computational Framework for Understanding Firm Communication During Disasters." Information Systems Research 35(2): 590-608. https://doi.org/10.1287/isre.2022.0128


Install

pip install -U spar-measure

Optional extras:

pip install "spar-measure[vector]"   # ChromaDB persistence for large corpora
pip install "spar-measure[dev]"      # pytest + gradio_client for contributing

Python 3.10 or later.


GUI quickstart

Launch the browser-based app:

python -m spar_measure gui
# equivalently: spar gui   or   spar-measure gui

Open http://localhost:7860/ in your browser. The GUI walks through five steps: upload a CSV, embed, define dimension seeds, run active retrieval to refine seeds, define scales (positive pole minus negative pole), and score. When you click Save Scales, the GUI writes a scales.json file that the headless score() API accepts directly.

Run headless in Google Colab:

GUI in Colab


Headless score() quickstart

Once seeds are stable (exported from the GUI or written by hand), call score() directly without launching Gradio:

import pandas as pd
from spar_measure import score

docs = pd.DataFrame({
    "doc_id": [0, 1, 2],
    "text": [
        "We encourage new ways of thinking.",
        "Quarterly results exceeded analyst expectations.",
        "We honor the founders' commitment to quality.",
    ],
})

scales = {
    "dimensions": {
        "Innovation": {"queries": ["We constantly experiment with new ideas.",
                                   "Innovation drives everything we do."]},
        "Tradition":  {"queries": ["We honor the practices that built this company.",
                                   "Our heritage and craft define who we are."]},
    },
    "scales": {
        "Innovation-Tradition": {"pos_dims": ["Innovation"], "neg_dims": ["Tradition"]},
    },
}

out = score(docs, scales, text_col="text", id_col="doc_id")
print(out)

Headless Colab notebook (no API key required, runs on CPU in ~60 seconds):

Headless API in Colab


ChromaStore: persistent embeddings for large corpora

For 50k+ document corpora, install the [vector] extra and persist embeddings to disk:

from spar_measure.vector_store import ChromaStore
from spar_measure import score

# Embed once.
store = ChromaStore("my_corpus", persist_dir="/data/chroma")
store.embed_and_store(docs_df, text_col="text")

# Load and score on subsequent runs (no re-embedding).
store = ChromaStore.load("/data/chroma", "my_corpus")
out = score(docs_df, scales, text_col="text", id_col="doc_id",
            precomputed_embeddings=store.get_all_embeddings())

Citation

@article{yan2024spar,
  author  = {Yan, Bei and Mai, Feng and Wu, Chaojiang and Chen, Rui and Li, Xiaolin},
  title   = {A Computational Framework for Understanding Firm Communication During Disasters},
  journal = {Information Systems Research},
  volume  = {35},
  number  = {2},
  pages   = {590--608},
  year    = {2024},
  doi     = {10.1287/isre.2022.0128}
}

Source code and documentation: https://github.com/maifeng/SPAR_measure

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spar_measure-0.3.6.tar.gz (3.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spar_measure-0.3.6-py3-none-any.whl (3.2 MB view details)

Uploaded Python 3

File details

Details for the file spar_measure-0.3.6.tar.gz.

File metadata

  • Download URL: spar_measure-0.3.6.tar.gz
  • Upload date:
  • Size: 3.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for spar_measure-0.3.6.tar.gz
Algorithm Hash digest
SHA256 f4b31cde0c9d933009c115129a3b84a9ce0df75cd0a86bd93ff1480928116749
MD5 0ce670cdaacf816ebd44e48570fb14cf
BLAKE2b-256 3b3b792417e72ac461dc4f5e2861e5abbe44fbfa21e9038b89a9f0a9ab8029c5

See more details on using hashes here.

File details

Details for the file spar_measure-0.3.6-py3-none-any.whl.

File metadata

  • Download URL: spar_measure-0.3.6-py3-none-any.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for spar_measure-0.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 66f350d0c17c3377ad1a050f76d2689333db54c4b2993e23fab82957541fe90a
MD5 68b3c18753c4205a31fcf02de18f51af
BLAKE2b-256 ba67fb9564efa4e64d0e361a7f6c026044f6453ab2d02467957e98a52e892bc3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page