Skip to main content

Advanced sentiment analysis toolkit with multi-provider LLM support and web scraping capabilities

Project description

Sentimatrix V2

Advanced sentiment analysis toolkit with multi-provider LLM support, comprehensive web scraping, and emotion detection.

Python 3.10+ License: MIT Tests

Features

LLM Providers (19 Providers)

  • Cloud Providers: OpenAI, Anthropic, Google Gemini, Groq, Mistral, Cohere
  • Inference Providers: Together AI, Fireworks, OpenRouter, Cerebras, DeepSeek
  • Local Providers: Ollama, LM Studio, vLLM, llama.cpp, text-generation-webui, ExLlamaV2
  • Enterprise: Azure OpenAI, AWS Bedrock

Web Scraping

  • Core Scrapers: HTTPX (async HTTP), Playwright (browser automation)
  • Platform Scrapers: Amazon, Steam, YouTube, Reddit, IMDB, Yelp, Trustpilot, Google Reviews
  • Commercial APIs: ScraperAPI, Apify, Bright Data, Oxylabs, Zyte, ScrapingBee, ScrapingAnt

Analysis

  • Sentiment Analysis: 3-class and 5-class classification with transformer models
  • Emotion Detection: GoEmotions (28 emotions), Ekman mapping (6 basic emotions)
  • Multi-Modal: Audio transcription (Whisper), image captioning (GPT-4V, Claude Vision)
  • Batch Processing: Efficient batch analysis with aggregate statistics

Output & Export

  • Formats: JSON, CSV, Excel, HTML reports, Markdown
  • Visualizations: Bar charts, pie charts, histograms, time series, comparison charts

Installation

# Basic installation
pip install sentimatrix

# With all LLM providers
pip install sentimatrix[llm]

# With scraping support (includes Playwright)
pip install sentimatrix[scraping]

# With ML models (transformers, torch)
pip install sentimatrix[models]

# Full installation
pip install sentimatrix[all]

# Development installation
pip install -e ".[dev]"

Browser Dependencies (for Amazon/JS-heavy sites)

# Install Playwright browsers
playwright install chromium

# Install system dependencies (Linux)
sudo playwright install-deps

Quick Start

Basic Sentiment Analysis

import asyncio
from sentimatrix import Sentimatrix

async def main():
    async with Sentimatrix() as sm:
        # Quick sentiment analysis
        result = await sm.analyze_sentiment("This product is amazing!")
        print(f"Sentiment: {result.sentiment}")  # "positive"
        print(f"Confidence: {result.confidence:.2%}")  # 95.00%

        # Emotion detection
        emotions = await sm.detect_emotions("I'm so excited about this!")
        print(f"Primary emotion: {emotions.primary_emotion.label}")  # "joy"

asyncio.run(main())

Scraping and Analyzing Reviews

import asyncio
from sentimatrix import Sentimatrix, LLMConfig

async def main():
    # Configure with Groq for fast LLM inference
    llm_config = LLMConfig(
        provider="groq",
        api_key="gsk_your_api_key",
        model="llama-3.3-70b-versatile"
    )

    async with Sentimatrix(llm_config=llm_config) as sm:
        # Scrape Steam reviews (works without browser deps)
        reviews = await sm.scrape_steam("730", limit=50)  # Counter-Strike 2
        print(f"Scraped {len(reviews)} reviews")

        # Analyze sentiment and emotions
        analysis = await sm.analyze_reviews(reviews)
        print(f"Positive: {analysis.positive_ratio:.1%}")
        print(f"Negative: {analysis.negative_ratio:.1%}")

        # Generate LLM-powered insights
        insights = await sm.generate_insights(reviews)
        print(f"Summary: {insights.summary}")
        print(f"Pros: {insights.pros}")
        print(f"Cons: {insights.cons}")

asyncio.run(main())

Using Different Scrapers

import asyncio

# Steam - Uses JSON API (no browser needed)
from sentimatrix.providers.scrapers.platforms import SteamScraper

async def scrape_steam():
    async with SteamScraper() as scraper:
        reviews = await scraper.scrape_reviews("730", limit=20)
        return reviews

# Amazon - Requires Playwright browser
from sentimatrix.providers.scrapers.platforms import AmazonScraper, AmazonConfig

async def scrape_amazon():
    config = AmazonConfig(country="us")
    async with AmazonScraper(config) as scraper:
        reviews = await scraper.scrape_reviews("B08N5WRWNW", limit=20)
        return reviews

# Commercial API - ScraperAPI (for anti-bot bypass)
from sentimatrix.providers.scrapers.commercial import ScraperAPIClient

async def scrape_with_api():
    async with ScraperAPIClient(api_key="your_key") as client:
        content = await client.scrape(
            "https://example.com",
            render_js=True,
            country_code="us"
        )
        return content

Using LLM Providers

import asyncio
from sentimatrix.providers.llm import GroqProvider, OpenAIProvider
from sentimatrix.core.config import LLMConfig

# Groq (fast, free tier available)
async def use_groq():
    config = LLMConfig(
        provider="groq",
        api_key="gsk_...",
        model="llama-3.3-70b-versatile"
    )
    async with GroqProvider(config) as provider:
        response = await provider.generate("Analyze this text...")
        print(response.content)

# OpenAI
async def use_openai():
    config = LLMConfig(
        provider="openai",
        api_key="sk-...",
        model="gpt-4o-mini"
    )
    async with OpenAIProvider(config) as provider:
        response = await provider.generate("Summarize these reviews...")
        print(response.content)

Configuration

YAML Configuration

# config.yaml
llm:
  provider: groq
  model: llama-3.3-70b-versatile
  api_key: ${GROQ_API_KEY}
  temperature: 0.7

scrapers:
  default_provider: playwright
  headless: true
  timeout: 30

cache:
  enabled: true
  backend: memory
  ttl: 3600

logging:
  level: INFO
  format: json
from sentimatrix import SentimatrixConfig, Sentimatrix

config = SentimatrixConfig.from_file("config.yaml")
sm = Sentimatrix(config)

Environment Variables

export GROQ_API_KEY="gsk_..."
export OPENAI_API_KEY="sk-..."
export SENTIMATRIX_LOG_LEVEL=INFO
export SENTIMATRIX_CACHE_ENABLED=true

Project Structure

sentimatrix/
├── core/
│   ├── config.py          # Configuration management
│   ├── logger.py          # Structured logging
│   ├── exceptions.py      # Exception hierarchy
│   ├── cache.py           # Memory & Redis caching
│   └── pipeline.py        # Pipeline orchestration
├── providers/
│   ├── llm/               # 19 LLM providers
│   ├── scrapers/
│   │   ├── platforms/     # Platform-specific scrapers
│   │   └── commercial/    # Commercial API clients
│   └── models/            # HuggingFace model providers
├── analysis/
│   ├── sentiment.py       # Sentiment analysis
│   ├── emotion.py         # Emotion detection
│   └── multimodal.py      # Audio/image/video analysis
├── input/                 # Input handlers (audio, image, video)
├── output/                # Exporters, formatters, visualizers
├── cli.py                 # Command-line interface
└── main.py                # Main Sentimatrix class

CLI Usage

# Analyze single text
sentimatrix analyze "This product is amazing!"

# Analyze from file
sentimatrix analyze-file reviews.txt --output results.json

# Scrape and analyze
sentimatrix scrape steam 730 --limit 50 --analyze

# Batch process CSV
sentimatrix batch input.csv --text-column review --output results.csv

# System info
sentimatrix info

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=sentimatrix

# Run specific test suite
pytest tests/unit/providers/llm/
pytest tests/unit/providers/scrapers/

# Run live tests (requires API keys)
python test_full_pipeline.py

Test Summary: 282+ tests passing across all modules.

Documentation

Roadmap

See ROADMAP.md for the development roadmap.

Contributing

See CONTRIBUTING.md for contribution guidelines.

Changelog

See CHANGELOG.md for version history.

License

MIT License - see LICENSE file for details.

Version

Current version: 0.2.0 (Stage 14 - Commercial Scrapers Complete)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentimatrix-0.2.1.tar.gz (521.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sentimatrix-0.2.1-py3-none-any.whl (314.4 kB view details)

Uploaded Python 3

File details

Details for the file sentimatrix-0.2.1.tar.gz.

File metadata

  • Download URL: sentimatrix-0.2.1.tar.gz
  • Upload date:
  • Size: 521.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sentimatrix-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f4e4c3a4fe36f55020c5382dbf6e0be7f067389e74e9abdce87972eb8c12578a
MD5 f70992625f76b0c98ff308a8d7d3b703
BLAKE2b-256 0d01f7575321bd8aeba65b1cc96d183d9b7f6b8fcb34f6d11c1e4657c3ccc2b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for sentimatrix-0.2.1.tar.gz:

Publisher: release.yml on Siddharth-magesh/Sentimatrix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sentimatrix-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: sentimatrix-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 314.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sentimatrix-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f54c76a15f319903129f1b86b9298e030ffa89b0416b3398f5f14e44266ce4cd
MD5 65ce0fadf2dd115083a43343182a29d0
BLAKE2b-256 e67fca4e0df3478a63614d8450149dd884e834e61c5d12cbe96d70b52228adec

See more details on using hashes here.

Provenance

The following attestation bundles were made for sentimatrix-0.2.1-py3-none-any.whl:

Publisher: release.yml on Siddharth-magesh/Sentimatrix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page