Skip to main content

SPAR: Semantic Projection with Active Retrieval

Project description

logo

SPAR: Semantic Projection with Active Retrieval

Overview

SPAR is a Python NLP package that enables interactive quantification of text. With SPAR, you can quantify short documents (e.g., social media posts) using latent, continuous scales such as creativity, collaboration, danger, by measuring their semantic similarity with a set of example (seed) documents, for example: 'encourage new ways of thinking', 'working together to weather the storm', 'we are facing a deadly virus.'

Main features:

  • conducts domain-adaptive and few-shots measurements, without requiring any model training or fine-tuning. It is combines the idea of semantic projection (Grand et al. 2022, Authors 2023) with active semantic search, which allows users to find the most relevant context-specific documents to define the scales.
  • supports multiple state-of-the-arts text embedding methods, such as Sentence Transformers or OpenAI Text Embeddings API.
  • comes with a user-friendly web interface that makes defining scales and conducting measurements intuitive and accessible.

SPAR is built on other open source packages such as HuggingFace Transformers, SentenceTransformers, and Gradio.

If you find SPAR useful in your work, please cite the following paper:

  • Blinded Authors (2023), A Computational Framework for Understanding Firm Communication During Disasters, Under Review at Information Systems Research.

Please note that the project is currently in a research preview (pre-alpha) stage. To view the planned features for the project, please see the Road Map.

Quick Start and Installation

Simply click the following button and run the code in the notebook to launch SPAR in Google Colab for quick testing:

Open In Colab

You can also install SPAR on your own machine. It is recommended to use a virtual environment and upgrade pip first with pip install -U pip. SPAR can be installed via pip:

pip install -U spar-measure

To launch SPAR on your own machine, use the following command in the terminal:

python -m spar_measure.gui

And open the interactive app in your browser at http://localhost:7860/.

If a CUDA GPU is available, SPAR will use it to speed up embedding. If you choose not to use a GPU, you can set the CUDA_VISIBLE_DEVICES environment variable to an empty string:

CUDA_VISIBLE_DEVICES="" python -m spar_measure.gui

See full documentation for other usage options here.

Interface and Usage

SPAR is based on the following 4 simple steps:

  1. Upload a CSV file with the text content to be measured and a document ID column. Select embedding method and embed the documents.
sc1
  1. Define a set of dimensions and generic seed queries. For example:

    • Creative: "We should adapt and innovate."
    • Positive emotion: "We are happy."
    • Danger: "It is dangerous."

    Then, search for sentences in a corpus that are similar to the generic seed queries, and use the results to define dimensions in the context of the corpus. For example:

    • Creative:
      • "Digital technology will play a huge role going forward."
      • "How do you adapt to these uncharted waters? "
    • Positive emotion:
      • "The smiling faces say it all."
      • "A round of applause to all of our recent WaFd Foundation grant recipients!"
    • Danger: "How do you prevent the spread of a deadly virus?"

    Enter the above new context-specific sentences to the query box and click the "Embed Queries and Save Dimensions" button.

sc2
  1. Define scales, which consists of one or more demensions. For example:
    • Sentiment = Positive emotion - Negative emotion
    • Creativity = Creative
sc3
  1. Project the document embeddings onto the scale embeddings. A CSV file with the results can be downloaded.
sc4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spar_measure-0.1.0.tar.gz (3.2 MB view hashes)

Uploaded Source

Built Distribution

spar_measure-0.1.0-py3-none-any.whl (3.2 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page