Skip to main content

Investigations in financial sentiment analysis

Project description

mood

Investigations in financial sentiment analysis

To install: pip install mood

Note: Requires an OpenAI API Token. How can to get one of those?

For more NLP work, see idiom and lexis.

Getting Started with Mood Models

The mood package provides a complete pipeline for training machine learning models to detect semantic attributes (or "moods") in text. Here's the typical workflow. If you already have your training data, you can skip steps 1 and 2.

1. Define Your Semantic Attributes

Start by defining the semantic attributes you want to model. These are the "moods" or characteristics you want to detect in text:

semantic_attributes = {
    "diplomatic_politeness": "Diplomatic Politeness (Degree of formality, civility, and ceremonial tone)",
    "military_intensity": "Military Reference Intensity (Mentions of soldiers, arms, tactics, or losses)",
    "sentiment_polarity": "Sentiment Polarity (Positive or negative affective tone)",
    # ... add more attributes as needed
}

2. Generate Training Data

Use the AI-powered dataset generation to create labeled training examples:

from mood.dataset_makers import make_semantic_attributes_dataset

# Generate training data for your semantic attributes
make_semantic_attributes_dataset(
    semantic_attributes=semantic_attributes,
    store="path/to/save/datasets",  # or use a MutableMapping
    n_examples=1000,  # total examples per attribute
    batch_size=100,   # examples per API call
    verbose=2         # show progress
)

This will create text segments with scores (0-5) for each semantic attribute, saved as text files with format:

0 The meeting was dull and unproductive, leaving everyone feeling dissatisfied.
3 The negotiations showed moderate diplomatic courtesy between the parties.
5 Your Excellency, it is indeed our greatest honor to welcome you here today.

3. Parse and Prepare Data

Convert the raw text data into structured format and compute embeddings:

from mood.dataset_makers import parsed_lines
import pandas as pd
from oa.batch_embeddings import compute_embeddings

# Parse the generated text files
dataset = pd.DataFrame(list(parsed_lines(dataset_content)))
# Results in: DataFrame with 'score' and 'segment' columns

# Compute embeddings for the text segments
embeddings = compute_embeddings(
    segments=dataset.segment.to_list(),
    batch_size=1000,
    verbosity=2
)

# Combine into final training dataset
training_data = pd.DataFrame({
    'score': dataset.score,
    'segment': dataset.segment,
    'embedding': embeddings
})

4. Train Mood Models

Use the MoodModelingManager to train and evaluate multiple types of models:

from mood.mood_modeling import MoodModelingManager

# Create manager with your training data
manager = MoodModelingManager(
    df=training_data,
    embedding_col='embedding',
    score_col='score',
    verbose=2  # Show detailed progress
)

# Train and evaluate models with single train/test split
results = manager.train_and_evaluate()

# For more stable metrics, run cross-validation
cv_results = manager.cross_validate_models(n_splits=10)

# Get performance summary
summary = manager.get_model_summary(use_cv=True)
print(summary)

5. Select and Deploy Best Model

# Fit final models on all data
manager.fit_final_models()

# Get the best model based on performance metric
best_model_name, best_model = manager.get_best_model(
    metric='spearman',  # or 'accuracy', 'f1', etc.
    use_cv=True
)

print(f"Best model: {best_model_name}")

# Use the model to predict mood scores for new text
new_embeddings = compute_embeddings(["Your new text here"])
mood_scores = manager.predict_mood(new_embeddings)
print(f"Mood scores: {mood_scores}")  # Values between 0 and 1

Model Types Available

The system automatically trains several types of models:

  • Numerical Regression: Ridge regression, SVR for continuous score prediction
  • Binary Classification: Logistic regression, SVM for high/low classification
  • Ordinal Regression: Specialized models for ordered categories (requires mord package)

Each model type handles the data differently:

  • Numerical models predict continuous scores
  • Binary models classify as high vs. low mood
  • Ordinal models respect the ordered nature of the scores

Output Interpretation

All final models output mood scores between 0 and 1, where:

  • 0 = completely lacks the semantic attribute
  • 1 = maximally expresses the semantic attribute
  • Values are ordinally aligned with training scores

Advanced Features

Custom Model Configuration:

# Define custom models
custom_models = {
    "custom_ridge": {
        "data_type": "numerical",
        "model_class": Ridge,
        "model_params": {"alpha": 0.5},
        "max_dims": 200,
        "output_transform": "sigmoid"
    }
}

manager = MoodModelingManager(df=data, models=custom_models)

Performance Analysis:

# Analyze model performance across different metrics
from mood.model_analysis import analyze_all

classifier_stats, regression_stats = compute_model_stats(model_stats)
report = analyze_all(classifier_stats, regression_stats)

Quick Example: Current Headlines Sentiment

For a quick start, try analyzing current headlines:

from mood import headlines_mood

# Get current headlines with sentiment scores
sentiment_scores = headlines_mood()
print(sentiment_scores)

This returns a dictionary mapping headlines to sentiment scores from -10 to +10.d

Investigations in financial sentiment analysis

To install: pip install mood

Note: Requires an OpenAI API Token. How can to get one of those?

For more NLP work, see idiom and lexis.

Examples

current headlines and their sentiment scores

>>> from mood import headlines_mood
>>> headlines_mood()
{"Yaccarino shakes up X amid Musk's pressure on costs, FT says": -3,
 'Coup-hit Niger was betting on a China-backed oil pipeline as a lifeline. Then the troubles began': -7,
 'A Mexico City neighborhood keeps the iconic Volkswagen Beetle alive': 1,
 'Bitter political fight in Bolivia is paralyzing the government as unrest boils over economic crisis': -8,
 'Credit’s Strong Run Stumbles for First Time This Year': -4,
 'Former Stark Chairman in Thai Detention After Months on the Run': -5,
 '2 Millionaire-Maker Technology Stocks': 5,
 'Want to Earn More in the Stock Market With Less Effort? Try This Simple Strategy': 4,
 '3 Reasons to Buy Nvidia Stock Before June 26': 7,
 "Cathie Wood Says Software Is the Next Big AI Opportunity -- 2 Super Stocks You'll Wish You'd Bought Today if She's Right": 8,
 'Companies that give money to Democrats are beating Republican donors in the stock market': 2,
 'Is Buying Stocks When the S&P 500 Hits a New All-Time High a Smart Strategy? History Provides a Clear Answer.': 2,
 'Q1 Design Software Earnings: Unity (NYSE:U) Earns Top Marks': 4,
 'Social Networking Q1 Earnings: Snap (NYSE:SNAP) Simply the Best': 6,
 "A Look Back at Footwear Stocks' Q1 Earnings: Steven Madden (NASDAQ:SHOO) Vs The Rest Of The Pack": -1,
 "A Look Back at Heavy Machinery Stocks' Q1 Earnings: Oshkosh (NYSE:OSK) Vs The Rest Of The Pack": -1,
 'Unpacking Q1 Earnings: Keurig Dr Pepper (NASDAQ:KDP) In The Context Of Other Beverages and Alcohol Stocks': -2,
 'Q1 Earnings Outperformers: Agilysys (NASDAQ:AGYS) And The Rest Of The Vertical Software Stocks': 3,
 'What do homebuilders want? Immigration reform': 0,
 'Examining the steep rise in monthly auto loan payments': -2,
 'Cannes Lions 2024: What marketers are saying': 1,
 "Sports has to feed into a streamer's 'ecosystem': Kevin Mayer": 0,
 'Taylor Swift vs. Central Banks: One Swiftie calls ‘BS’': -2,
 "'There are bargains out there' as summer travel season heats up": 3,
 'This week in Bidenomics: Pile on the debt': -3,
 'Netflix flirts with all-time highs as investors cheer ad momentum, foray into live sports': 8,
 'Existing home sales decline in May as home prices reach record high': -4,
 "Regulators find weakness in 'living wills' from BofA, Citi, Goldman, and JPM": -6,
 'How to watch and listen to Yahoo Finance': 0,
 "Why Nvidia's 'gravy train' could come to 'screeching halt' after a volatile trading week": -7,
 'Trump’s campaign now has a cash advantage over Biden': 1,
 'The Anti-Altman’s Hail Mary Pitch to Investors': -3,
 "Tesla's Autonomous Strategy to be Key to EV Maker Reaching $1 Trillion-Plus Valuation, Wedbush Says": 7,
 'Alaska Airlines, Flight Attendants Union Reach Tentative Deal': 6,
 'China’s 618 online shopping event marks first-ever sales drop': -7,
 'Apple Throws Down the Gauntlet to European Regulators': -3,
 'Apple’s AI Features Won’t Be Offered in European Union Because of New Laws': -5,
 'Equity Markets Mostly Lower as Existing Home Sales Drop': -4,
 'Oil Rig Count Falls by Three This Week, Baker Hughes Says': -2,
 'Dubai Real Estate’s Resilience May Signal End of Boom-Bust Cycle': 5,
 'Montana aims to remind seniors about its ‘reverse annuity mortgage’': 0,
 'Desmarais-Backed Mortgage Firm Nesto Acquires Lender CMLS': 3,
 'Record high prices, rising mortgage rates depress US home sales': -8,
 'US home sales fall for the 3rd straight month in May amid rising mortgage rates, record-high prices': -8,
 "On wealthy Martha's Vineyard, costly housing is forcing workers out and threatening public safety": -7,
 "Mortgage rates decline for third consecutive week — 'bodes well for the housing market'": 4,
 'Congress should adopt mortgage interest tax credit: CHLA': 3,
 'Savings interest rates today, June 22, 2024 (best account provides 5.30% APY)': 4,
 'How to pay off your house faster with biweekly mortgage payments': 5,
 'The best credit cards for vacations for June 2024': 5,
 "Climate change makes India's monsoons erratic. Can farmers still find a way to prosper?": -2,
 'CDK Hackers Want Millions in Ransom to End Car Dealership Outage': -7,
 'S&P 500 Trading Volume Spikes at Wall Street Close: Markets Wrap': 0,
 'Bitcoin Could Hit $500,000 by October 2025, According to This Billionaire Investor': 7,
 'This Is Why Altcoin Investors Struggle Despite Bitcoin, Ether Sitting Near Yearly Highs': -3,
 'BitoGroup partners with Far Eastern International Bank to launch first crypto-friendly bank account': 2,
 'Cathie Wood sells $13 million of a struggling tech stock': -6,
 'Nvidia Stock Gets Hit With Bearish Reversal. If You Have Big Profits, This Is What You Should Do.': -5,
 "Single Mom With $1.3 Million From Divorce Can't Afford $8,000 Monthly Rent, Dave Ramsey Says It's Time To Move": -6,
 'Morningstar | A Weekly Summary of Stock Ideas and Developments in the Companies We Cover': 0,
 '5 Little-Known Perks of a Costco Membership': 4,
 'Billionaire David Tepper Goes Bargain Hunting: 6 Stocks He Just Bought': 5,
 'Palantir Inks Deal With Starlab. Is the Stock Ready to Head to the Stars?': 6,
 "Here's the Average Social Security Benefit at Age 62 -- and Why It's Not the Best News for Retirees": -5,
 'Analyst Report: Mitsubishi UFJ Financial Group, Inc.': 0,
 'Forget NextEra Energy. Buy This Magnificent Dividend King Instead': 6}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mood-0.0.16.tar.gz (51.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mood-0.0.16-py3-none-any.whl (42.4 kB view details)

Uploaded Python 3

File details

Details for the file mood-0.0.16.tar.gz.

File metadata

  • Download URL: mood-0.0.16.tar.gz
  • Upload date:
  • Size: 51.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mood-0.0.16.tar.gz
Algorithm Hash digest
SHA256 ef2f85cce57ac28973c7f73b238c237e28576a6b35b1daee9ef02b6e4882e201
MD5 82a1d8983b3a277fcb9eff11169d7929
BLAKE2b-256 36a082295696def7fcdbe626f9346c7a8c6537407f0f52fb39825841b0e4eba1

See more details on using hashes here.

File details

Details for the file mood-0.0.16-py3-none-any.whl.

File metadata

  • Download URL: mood-0.0.16-py3-none-any.whl
  • Upload date:
  • Size: 42.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mood-0.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 3938a876daf28bffe4bc65a10edb48817a473c2a937bca373b315f0ce25f72bb
MD5 e8d3397752768373a8160610061b4323
BLAKE2b-256 83f2c68e20e5536bb41014f889b2145b5315155af7456830e82cc2bd6a87778e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page