SPAR: Semantic Projection with Active Retrieval
Project description
SPAR: Semantic Projection with Active Retrieval
SPAR scores short text on bipolar concepts you define as positive_seeds - negative_seeds.
No model training or fine-tuning required.
Reference: Yan, Bei, Feng Mai, Chaojiang Wu, Rui Chen, and Xiaolin Li (2024). "A Computational Framework for Understanding Firm Communication During Disasters." Information Systems Research 35(2): 590-608. https://doi.org/10.1287/isre.2022.0128
Install
pip install -U spar-measure
Optional extras:
pip install "spar-measure[vector]" # ChromaDB persistence for large corpora
pip install "spar-measure[dev]" # pytest + gradio_client for contributing
Python 3.10 or later.
GUI quickstart
Launch the browser-based app:
python -m spar_measure gui
# equivalently: spar gui or spar-measure gui
Open http://localhost:7860/ in your browser. The GUI walks through five steps:
upload a CSV, embed, define dimension seeds, run active retrieval to refine seeds,
define scales (positive pole minus negative pole), and score. When you click
Save Scales, the GUI writes a scales.json file that the headless score() API
accepts directly.
Run headless in Google Colab:
Headless score() quickstart
Once seeds are stable (exported from the GUI or written by hand), call score()
directly without launching Gradio:
import pandas as pd
from spar_measure import score
docs = pd.DataFrame({
"doc_id": [0, 1, 2],
"text": [
"We encourage new ways of thinking.",
"Quarterly results exceeded analyst expectations.",
"We honor the founders' commitment to quality.",
],
})
scales = {
"dimensions": {
"Innovation": {"queries": ["We constantly experiment with new ideas.",
"Innovation drives everything we do."]},
"Tradition": {"queries": ["We honor the practices that built this company.",
"Our heritage and craft define who we are."]},
},
"scales": {
"Innovation-Tradition": {"pos_dims": ["Innovation"], "neg_dims": ["Tradition"]},
},
}
out = score(docs, scales, text_col="text", id_col="doc_id")
print(out)
Headless Colab notebook (no API key required, runs on CPU in ~60 seconds):
ChromaStore: persistent embeddings for large corpora
For 50k+ document corpora, install the [vector] extra and persist embeddings to disk:
from spar_measure.vector_store import ChromaStore
from spar_measure import score
# Embed once.
store = ChromaStore("my_corpus", persist_dir="/data/chroma")
store.embed_and_store(docs_df, text_col="text")
# Load and score on subsequent runs (no re-embedding).
store = ChromaStore.load("/data/chroma", "my_corpus")
out = score(docs_df, scales, text_col="text", id_col="doc_id",
precomputed_embeddings=store.get_all_embeddings())
Citation
@article{yan2024spar,
author = {Yan, Bei and Mai, Feng and Wu, Chaojiang and Chen, Rui and Li, Xiaolin},
title = {A Computational Framework for Understanding Firm Communication During Disasters},
journal = {Information Systems Research},
volume = {35},
number = {2},
pages = {590--608},
year = {2024},
doi = {10.1287/isre.2022.0128}
}
Source code and documentation: https://github.com/maifeng/SPAR_measure
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spar_measure-0.3.6.tar.gz.
File metadata
- Download URL: spar_measure-0.3.6.tar.gz
- Upload date:
- Size: 3.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4b31cde0c9d933009c115129a3b84a9ce0df75cd0a86bd93ff1480928116749
|
|
| MD5 |
0ce670cdaacf816ebd44e48570fb14cf
|
|
| BLAKE2b-256 |
3b3b792417e72ac461dc4f5e2861e5abbe44fbfa21e9038b89a9f0a9ab8029c5
|
File details
Details for the file spar_measure-0.3.6-py3-none-any.whl.
File metadata
- Download URL: spar_measure-0.3.6-py3-none-any.whl
- Upload date:
- Size: 3.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66f350d0c17c3377ad1a050f76d2689333db54c4b2993e23fab82957541fe90a
|
|
| MD5 |
68b3c18753c4205a31fcf02de18f51af
|
|
| BLAKE2b-256 |
ba67fb9564efa4e64d0e361a7f6c026044f6453ab2d02467957e98a52e892bc3
|