Skip to main content

SPAR: Semantic Projection with Active Retrieval

Project description

Overview

SPAR is a Python NLP package that facilitates interactive measurement of text using LLMs. With SPAR, you can quantify short documents (e.g., social media posts) based on theoretical concepts such as creativity and collaboration, by measuring their semantic similarity with a set of example (seed) documents.

How it works:

  1. Start with
    i. A corpus of documents that you want to measure;
    ii. generic seed sentences that define theoretical concepts, e.g., Creativity: we should innovate and Collaboration: we should collaborate.
  2. Embed them into a semantic space using a pre-trained LLM.
  3. Use semantic search to find domain-specific exemplary documents in the corpus that reflect the theoretical concepts in context. For example: 'We encourage new ways of thinking', 'We should working together to weather the storm'.
  4. Compute the dot product between docuements and exemplary documents.

Main features:

  • Enables domain-adaptive and few-shot measurements of theoretical concepts without requiring model training or fine-tuning.
  • Combines the idea of semantic projection with active semantic search, which allows users to find the most relevant, context-specific documents to define the theoretical scales.
  • Supports multiple state-of-the-arts text embedding models, such as Sentence Transformers and OpenAI Text Embeddings API.
  • Comes with a user-friendly web interface that makes defining theoretical scales and conducting measurements intuitive and accessible.
  • Reference:
    • Bei Yan, Feng Mai, Chaojiang Wu, Rui Chen, Xiaolin Li (2023). A Computational Framework for Understanding Firm Communication During Disasters. Information Systems Research. https://doi.org/10.1287/isre.2022.0128

SPAR is built on open source packages such as HuggingFace Transformers, SentenceTransformers, and Gradio.

Installation and Quick Start

To quickly launch SPAR in Google Colab, click the following button and run the notebook code:

Open In Colab

You can also install SPAR on your own machine. It is recommended to use a virtual environment and upgrade pip first with pip install -U pip. SPAR can be installed via pip:

pip install -U spar-measure

To launch SPAR on your own machine, use the following command in the terminal:

python -m spar_measure.gui

And open the interactive app in your browser at http://localhost:7860/.

If a CUDA GPU is available, SPAR will use it to speed up embedding. If you choose not to use a GPU, you can set the CUDA_VISIBLE_DEVICES environment variable to an empty string:

CUDA_VISIBLE_DEVICES="" python -m spar_measure.gui

Limitations

  • SPAR may not be suitable for longer or more complex documents since it represents a document using a single vector.
  • Sentence embeddings may not be suitable for theoretical constructs that rely primarily on syntactic features.
  • Pretrained LLMs may not have up-to-date world knowledge or new vocabularies.
  • Semantic projection is a linear operation, so it may not capture non-linear patterns in the data as well as fine-tuning approaches.

Additional Details

For additional details and source code, please refer to the project's GitHub Repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spar_measure-0.3.2.tar.gz (3.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spar_measure-0.3.2-py3-none-any.whl (3.2 MB view details)

Uploaded Python 3

File details

Details for the file spar_measure-0.3.2.tar.gz.

File metadata

  • Download URL: spar_measure-0.3.2.tar.gz
  • Upload date:
  • Size: 3.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for spar_measure-0.3.2.tar.gz
Algorithm Hash digest
SHA256 2a33e47401dbc2503a7d4643a608a9114b7e5be90d2dcf919f5b0bd9a68a841c
MD5 7b666259b03903300d88dfb1a20b997a
BLAKE2b-256 b8365e2066b64049ab4f831a45450baba2408ce259b10e3647ce4ecf9e1ea59c

See more details on using hashes here.

File details

Details for the file spar_measure-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: spar_measure-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for spar_measure-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5ddc31f217992df9262f9e647cbe1f5aaf5d826e83c3dc447fd2e2da089655a4
MD5 8f45799e5df851db39cbac789538b5d2
BLAKE2b-256 42478f3d17a8940b5a34aebcec6a003aec03d3e0a18a582340b9b9d8c25a01ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page