MARSA (Multiple-Aspect Rule-Based Sentiment Annotator) is a lightweight assistant for aspect-sentiment pair extraction and correction.
Project description
MARSA
MARSA is a lightweight tool designed to streamline aspect-based sentiment analysis (ABSA) by automating the extraction and pre-labeling of aspect-sentiment pairs from review-style text. MARSA combines rule-based aspect extraction with sentiment analysis to accelerate the long process of manually labeling text data. It is especially useful for analyzing social media content such as Reddit comments and Twitter posts and mining product reviews from platforms like Amazon and Yelp.
The tool simplifies ABSA by identifying multiple aspects within a single sentence and automatically assigning initial sentiment scores using VADER. Users can define custom aspect terms and categories to tailor the analysis to their needs. MARSA also supports exporting results in JSON or CSV formats for easy manual review or use in training models. It can be accessed via command line or Python API, offering convenient ways to interact with it.
Pipeline Architecture
MARSA uses a two-stage pipeline:
- Aspect Extraction: Rule-based matching using configurable phrase dictionaries
- Sentiment Analysis: VADER sentiment analyzer processes text within the specified context window around each detected aspect
Installation
Install with pip
pip install marsapy
Install with uv
uv add marsapy
For GPU support (optional):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu129
uv add torch --index pytorch-cu129 --index-url https://download.pytorch.org/whl/cu129
Quick Start
Command Line
# Analyze a single text string
marsa analyze-text "Great camera but poor battery" --config config.yaml --output results.json --context-window 3
# Analyze a file of comments (short notation)
marsa analyze-file comments.txt -c config.yaml -o results.json -w 3
Python API
from marsa import AspectSentimentPipeline
pipeline = AspectSentimentPipeline(config_file="config.yaml", context_window=2)
results = pipeline.process_corpus(["I love the camera but don't like the battery life"])
Context Window
The --context-window (shorthand notation: -c) parameter controls how many words around each aspect phrase are analyzed for sentiment. A larger context window (e.g., 5) captures more nuanced sentiment but may include irrelevant text, while a smaller window (e.g., 1-2) focuses on immediate sentiment but might miss important context.
Example:
- Text: "I love the sleek design but hate the poor performance"
- Aspect "design" with context window 1: Analyzes "sleek design but"
- Aspect "design" with context window 3: Analyzes "love the sleek design but hate the"
- Aspect "performance" with context window 1: Analyzes "poor performance"
- Aspect "performance" with context window 3: Analyzes "hate the poor performance"
Configuration
The easiest way to configure MARSA is by using a YAML file. Create a config.yaml fine and define your aspects:
aspects:
camera:
phrases: ["camera", "photo", "picture", "pics", "photography", "image", "snap"]
category: "hardware"
battery:
phrases: ["battery", "power", "charge", "charging", "juice", "drain", "life"]
category: "hardware"
screen:
phrases: ["screen", "display", "resolution", "brightness", "monitor", "lcd", "oled"]
category: "interface"
You can define aspects by creating a config.json file as well:
{
"aspects": {
"camera": {
"phrases": ["camera", "photo", "picture", "pics", "photography", "image", "snap"],
"category": "hardware"
},
"battery": {
"phrases": ["battery", "power", "charge", "charging", "juice", "drain", "life"],
"category": "hardware"
},
"screen": {
"phrases": ["screen", "display", "resolution", "brightness", "monitor", "lcd", "oled"],
"category": "interface"
}
}
}
Sentiment Analysis
MARSA uses an ensemble approach combining VADER and BERT models for sentiment classification:
- VADER: Lexicon-based analyzer that handles social media text, slang, and emoticons well
- BERT: Twitter-RoBERTa transformer model for contextual sentiment understanding
- Ensemble Method: Weighted combination based on model confidence and agreement
- Output: Each aspect gets a sentiment label (positive/negative/neutral) and confidence score (0.0-1.0)
- Threshold: Scores within ±0.05 are classified as neutral
- Confidence Interpretation: Higher confidence scores indicate more reliable predictions; low confidence scores suggest the result may need manual review
Output Format
MARSA outputs structured data showing detected aspects with their sentiment classifications:
[
{
"cleaned_text": "great camera but battery life is terrible",
"aspect_sentiments": [
{
"aspect": "camera",
"category": "hardware",
"sentiment": "positive",
"confidence": 0.85
},
{
"aspect": "battery",
"category": "hardware",
"sentiment": "negative",
"confidence": 0.92
}
]
},
{
"cleaned_text": "beautiful screen display",
"aspect_sentiments": [
{
"aspect": "screen",
"category": "interface",
"sentiment": "positive",
"confidence": 0.95
}
]
}
]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file marsapy-0.1.0.tar.gz.
File metadata
- Download URL: marsapy-0.1.0.tar.gz
- Upload date:
- Size: 216.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af8ebbdaa4402dc0d9bc97d40acb35a1a76b7ca8881a13b55b0506c594043d72
|
|
| MD5 |
a56a366d069463a7efeb151238b5ec53
|
|
| BLAKE2b-256 |
7e5c90ed86760055eff939b290daa0f1f34a8878b1bb0baa72d83828abb01fce
|
File details
Details for the file marsapy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: marsapy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2b5d6c920a7a438ca84a133252130ed08e9de415493fc54e72a250367ad1dab
|
|
| MD5 |
2c44fcbb982121306d268ceac6d564af
|
|
| BLAKE2b-256 |
2744cca99cb12508657ed4af9a32b91efc1cf56a856aa6da12803664d365543c
|