SPAR: Semantic Projection with Active Retrieval

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

SPAR: Semantic Projection with Active Retrieval

Overview
Quick Start and Installation
Interface and Usage

Overview

SPAR is a Python NLP package that enables interactive quantification of text. With SPAR, you can quantify short documents (e.g., social media posts) using latent, continuous scales such as creativity, collaboration, danger, by measuring their semantic similarity with a set of example (seed) documents, for example: 'encourage new ways of thinking', 'working together to weather the storm', 'we are facing a deadly virus.'

Main features:

conducts domain-adaptive and few-shots measurements, without requiring any model training or fine-tuning. It is combines the idea of semantic projection (Grand et al. 2022, Authors 2023) with active semantic search, which allows users to find the most relevant context-specific documents to define the scales.
supports multiple state-of-the-arts text embedding methods, such as Sentence Transformers or OpenAI Text Embeddings API.
comes with a user-friendly web interface that makes defining scales and conducting measurements intuitive and accessible.

SPAR is built on other open source packages such as HuggingFace Transformers, SentenceTransformers, and Gradio.

If you find SPAR useful in your work, please cite the following paper:

Blinded Authors (2023), A Computational Framework for Understanding Firm Communication During Disasters, Under Review at Information Systems Research.

Please note that the project is currently in a research preview (pre-alpha) stage. To view the planned features for the project, please see the Road Map.

Quick Start and Installation

Simply click the following button and run the code in the notebook to launch SPAR in Google Colab for quick testing:

You can also install SPAR on your own machine. It is recommended to use a virtual environment and upgrade pip first with pip install -U pip. SPAR can be installed via pip:

pip install -U spar-measure

To launch SPAR on your own machine, use the following command in the terminal:

python -m spar_measure.gui

And open the interactive app in your browser at http://localhost:7860/.

If a CUDA GPU is available, SPAR will use it to speed up embedding. If you choose not to use a GPU, you can set the CUDA_VISIBLE_DEVICES environment variable to an empty string:

CUDA_VISIBLE_DEVICES="" python -m spar_measure.gui

See full documentation for other usage options here.

Interface and Usage

SPAR is based on the following 4 simple steps:

Upload a CSV file with the text content to be measured and a document ID column. Select embedding method and embed the documents.

Define a set of dimensions and generic seed queries. For example:
- Creative: "We should adapt and innovate."
- Positive emotion: "We are happy."
- Danger: "It is dangerous."
Then, search for sentences in a corpus that are similar to the generic seed queries, and use the results to define dimensions in the context of the corpus. For example:
- Creative:
  - "Digital technology will play a huge role going forward."
  - "How do you adapt to these uncharted waters? "
- Positive emotion:
  - "The smiling faces say it all."
  - "A round of applause to all of our recent WaFd Foundation grant recipients!"
- Danger: "How do you prevent the spread of a deadly virus?"
Enter the above new context-specific sentences to the query box and click the "Embed Queries and Save Dimensions" button.

Define scales, which consists of one or more demensions. For example:
- Sentiment = Positive emotion - Negative emotion
- Creativity = Creative

Project the document embeddings onto the scale embeddings. A CSV file with the results can be downloaded.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.0

Nov 12, 2023

0.1.7

Apr 8, 2023

0.1.6

Mar 13, 2023

0.1.5

Mar 13, 2023

0.1.4

Mar 12, 2023

0.1.3

Mar 12, 2023

0.1.2

Mar 12, 2023

0.1.1

Mar 12, 2023

This version

0.1.0

Mar 12, 2023

0.0.3

Mar 12, 2023

0.0.1 yanked

Mar 10, 2023

Reason this release was yanked:

buggy

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spar_measure-0.1.0.tar.gz (3.2 MB view hashes)

Uploaded Mar 12, 2023 Source

Built Distribution

spar_measure-0.1.0-py3-none-any.whl (3.2 MB view hashes)

Uploaded Mar 12, 2023 Python 3

Hashes for spar_measure-0.1.0.tar.gz

Hashes for spar_measure-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2dc3954eaeefaec6a6e37733e8aa379e47a38bf6e4b8d1b18a5391106503d39b`
MD5	`58f8487b908114137f0fc86c5aa08324`
BLAKE2b-256	`56b418a5141882c97375b2862c6acb140c8770f4764bc3983c4306f68153c05c`

Hashes for spar_measure-0.1.0-py3-none-any.whl

Hashes for spar_measure-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1d0135053a35c7ae0db310f41c9845866caa39b8ee763439df83be56b86c128d`
MD5	`88e92ef754f21bec07fe721349aeeabf`
BLAKE2b-256	`5c43b6c23a91df6bb5306c5c5da5a66a4684d6a91a217e1a959f6ad563a73fb4`