Advanced sentiment analysis toolkit with multi-provider LLM support and web scraping capabilities
Project description
Sentimatrix V2
Advanced sentiment analysis toolkit with multi-provider LLM support, comprehensive web scraping, and emotion detection.
Features
LLM Providers (19 Providers)
- Cloud Providers: OpenAI, Anthropic, Google Gemini, Groq, Mistral, Cohere
- Inference Providers: Together AI, Fireworks, OpenRouter, Cerebras, DeepSeek
- Local Providers: Ollama, LM Studio, vLLM, llama.cpp, text-generation-webui, ExLlamaV2
- Enterprise: Azure OpenAI, AWS Bedrock
Web Scraping
- Core Scrapers: HTTPX (async HTTP), Playwright (browser automation)
- Platform Scrapers: Amazon, Steam, YouTube, Reddit, IMDB, Yelp, Trustpilot, Google Reviews
- Commercial APIs: ScraperAPI, Apify, Bright Data, Oxylabs, Zyte, ScrapingBee, ScrapingAnt
Analysis
- Sentiment Analysis: 3-class and 5-class classification with transformer models
- Emotion Detection: GoEmotions (28 emotions), Ekman mapping (6 basic emotions)
- Multi-Modal: Audio transcription (Whisper), image captioning (GPT-4V, Claude Vision)
- Batch Processing: Efficient batch analysis with aggregate statistics
Output & Export
- Formats: JSON, CSV, Excel, HTML reports, Markdown
- Visualizations: Bar charts, pie charts, histograms, time series, comparison charts
Installation
# Basic installation
pip install sentimatrix
# With all LLM providers
pip install sentimatrix[llm]
# With scraping support (includes Playwright)
pip install sentimatrix[scraping]
# With ML models (transformers, torch)
pip install sentimatrix[models]
# Full installation
pip install sentimatrix[all]
# Development installation
pip install -e ".[dev]"
Browser Dependencies (for Amazon/JS-heavy sites)
# Install Playwright browsers
playwright install chromium
# Install system dependencies (Linux)
sudo playwright install-deps
Quick Start
Basic Sentiment Analysis
import asyncio
from sentimatrix import Sentimatrix
async def main():
async with Sentimatrix() as sm:
# Quick sentiment analysis
result = await sm.analyze_sentiment("This product is amazing!")
print(f"Sentiment: {result.sentiment}") # "positive"
print(f"Confidence: {result.confidence:.2%}") # 95.00%
# Emotion detection
emotions = await sm.detect_emotions("I'm so excited about this!")
print(f"Primary emotion: {emotions.primary_emotion.label}") # "joy"
asyncio.run(main())
Scraping and Analyzing Reviews
import asyncio
from sentimatrix import Sentimatrix, LLMConfig
async def main():
# Configure with Groq for fast LLM inference
llm_config = LLMConfig(
provider="groq",
api_key="gsk_your_api_key",
model="llama-3.3-70b-versatile"
)
async with Sentimatrix(llm_config=llm_config) as sm:
# Scrape Steam reviews (works without browser deps)
reviews = await sm.scrape_steam("730", limit=50) # Counter-Strike 2
print(f"Scraped {len(reviews)} reviews")
# Analyze sentiment and emotions
analysis = await sm.analyze_reviews(reviews)
print(f"Positive: {analysis.positive_ratio:.1%}")
print(f"Negative: {analysis.negative_ratio:.1%}")
# Generate LLM-powered insights
insights = await sm.generate_insights(reviews)
print(f"Summary: {insights.summary}")
print(f"Pros: {insights.pros}")
print(f"Cons: {insights.cons}")
asyncio.run(main())
Using Different Scrapers
import asyncio
# Steam - Uses JSON API (no browser needed)
from sentimatrix.providers.scrapers.platforms import SteamScraper
async def scrape_steam():
async with SteamScraper() as scraper:
reviews = await scraper.scrape_reviews("730", limit=20)
return reviews
# Amazon - Requires Playwright browser
from sentimatrix.providers.scrapers.platforms import AmazonScraper, AmazonConfig
async def scrape_amazon():
config = AmazonConfig(country="us")
async with AmazonScraper(config) as scraper:
reviews = await scraper.scrape_reviews("B08N5WRWNW", limit=20)
return reviews
# Commercial API - ScraperAPI (for anti-bot bypass)
from sentimatrix.providers.scrapers.commercial import ScraperAPIClient
async def scrape_with_api():
async with ScraperAPIClient(api_key="your_key") as client:
content = await client.scrape(
"https://example.com",
render_js=True,
country_code="us"
)
return content
Using LLM Providers
import asyncio
from sentimatrix.providers.llm import GroqProvider, OpenAIProvider
from sentimatrix.core.config import LLMConfig
# Groq (fast, free tier available)
async def use_groq():
config = LLMConfig(
provider="groq",
api_key="gsk_...",
model="llama-3.3-70b-versatile"
)
async with GroqProvider(config) as provider:
response = await provider.generate("Analyze this text...")
print(response.content)
# OpenAI
async def use_openai():
config = LLMConfig(
provider="openai",
api_key="sk-...",
model="gpt-4o-mini"
)
async with OpenAIProvider(config) as provider:
response = await provider.generate("Summarize these reviews...")
print(response.content)
Configuration
YAML Configuration
# config.yaml
llm:
provider: groq
model: llama-3.3-70b-versatile
api_key: ${GROQ_API_KEY}
temperature: 0.7
scrapers:
default_provider: playwright
headless: true
timeout: 30
cache:
enabled: true
backend: memory
ttl: 3600
logging:
level: INFO
format: json
from sentimatrix import SentimatrixConfig, Sentimatrix
config = SentimatrixConfig.from_file("config.yaml")
sm = Sentimatrix(config)
Environment Variables
export GROQ_API_KEY="gsk_..."
export OPENAI_API_KEY="sk-..."
export SENTIMATRIX_LOG_LEVEL=INFO
export SENTIMATRIX_CACHE_ENABLED=true
Project Structure
sentimatrix/
├── core/
│ ├── config.py # Configuration management
│ ├── logger.py # Structured logging
│ ├── exceptions.py # Exception hierarchy
│ ├── cache.py # Memory & Redis caching
│ └── pipeline.py # Pipeline orchestration
├── providers/
│ ├── llm/ # 19 LLM providers
│ ├── scrapers/
│ │ ├── platforms/ # Platform-specific scrapers
│ │ └── commercial/ # Commercial API clients
│ └── models/ # HuggingFace model providers
├── analysis/
│ ├── sentiment.py # Sentiment analysis
│ ├── emotion.py # Emotion detection
│ └── multimodal.py # Audio/image/video analysis
├── input/ # Input handlers (audio, image, video)
├── output/ # Exporters, formatters, visualizers
├── cli.py # Command-line interface
└── main.py # Main Sentimatrix class
CLI Usage
# Analyze single text
sentimatrix analyze "This product is amazing!"
# Analyze from file
sentimatrix analyze-file reviews.txt --output results.json
# Scrape and analyze
sentimatrix scrape steam 730 --limit 50 --analyze
# Batch process CSV
sentimatrix batch input.csv --text-column review --output results.csv
# System info
sentimatrix info
Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=sentimatrix
# Run specific test suite
pytest tests/unit/providers/llm/
pytest tests/unit/providers/scrapers/
# Run live tests (requires API keys)
python test_full_pipeline.py
Test Summary: 282+ tests passing across all modules.
Documentation
- Quickstart Guide
- API Reference
- Architecture Overview
- Provider Guide
- Scraper Guide
- Configuration Guide
- Examples
- Troubleshooting
Roadmap
See ROADMAP.md for the development roadmap.
Contributing
See CONTRIBUTING.md for contribution guidelines.
Changelog
See CHANGELOG.md for version history.
License
MIT License - see LICENSE file for details.
Version
Current version: 0.2.0 (Stage 14 - Commercial Scrapers Complete)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sentimatrix-0.2.0.tar.gz.
File metadata
- Download URL: sentimatrix-0.2.0.tar.gz
- Upload date:
- Size: 636.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73181bccce9b3bdf771b57e5c3a64bb30ad9f6add1f41c8ab36ece3d8f394829
|
|
| MD5 |
873642a3a8007d15b9b230c08816b43e
|
|
| BLAKE2b-256 |
163326a9752bb0e2ca0559090f99243f2dbfa2a826cc1e900d2795107f89d501
|
File details
Details for the file sentimatrix-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sentimatrix-0.2.0-py3-none-any.whl
- Upload date:
- Size: 314.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
748603dc120572109c4a4cbc18e5e654adad5166141073f77cacd25c4f860b40
|
|
| MD5 |
667d2e5fc23e4be2282be86cf3c3c5ce
|
|
| BLAKE2b-256 |
7f22877ef86cf6e31c298b2f4bdd172c0ecfcb42ffaf699963d750841ef41503
|