A Learning-to-Rank library with LambdaMART, BM25, and MovieLens support

These details have not been verified by PyPI

Project links

Project description

Learning-to-Rank from Scratch

A complete implementation of a Learning-to-Rank system using LambdaMART with LightGBM for query-document ranking on the MovieLens dataset.

🎯 Overview

This project implements a state-of-the-art ranking system that learns to rank movies for users based on:

Features: TF-IDF similarity, document popularity, engagement signals
Model: LambdaMART using LightGBM with pairwise preference learning
Baseline: BM25 for comparison
Evaluation: NDCG@10, MAP (Mean Average Precision), Precision@K
Validation: 5-fold cross-validation with comprehensive metric comparison

📊 Dataset

MovieLens 100K - Contains 100,000 ratings from 943 users on 1,682 movies

Ratings converted to relevance labels (0-3 scale)
Query-document-relevance triplets created from user-movie interactions
Rich metadata including genres, titles, and user demographics

🚀 Quick Start

Prerequisites

pip install -r requirements.txt

Run the Notebook

jupyter notebook learning_to_rank.ipynb

The notebook will:

Download the MovieLens dataset automatically
Engineer features from movie metadata and user interactions
Train LambdaMART model with cross-validation
Compare against BM25 baseline
Generate metric comparison charts
Analyze feature importance

🔧 Features Engineering

1. TF-IDF Similarity Features

User profiles created from highly-rated movies
Cosine similarity between user profile and candidate movies
Captures content-based relevance

2. Document Popularity Features

Number of ratings per movie
Average rating and standard deviation
Number of unique users
Popularity score (composite metric)

3. Engagement Signal Features

User activity level (number of ratings)
User rating patterns (mean, std)
User demographics (age, gender)
Movie genre indicators (18 genres)

📈 Model Architecture

LambdaMART Configuration

{
    'objective': 'lambdarank',
    'metric': 'ndcg',
    'ndcg_eval_at': [10],
    'learning_rate': 0.05,
    'num_leaves': 31,
    'max_depth': 6,
    'min_data_in_leaf': 20,
    'feature_fraction': 0.8,
    'bagging_fraction': 0.8,
    'bagging_freq': 5
}

Training Strategy

Objective: Pairwise preference learning (lambdarank)
Optimization: Directly optimizes NDCG
Cross-validation: 5-fold GroupKFold (groups by user)
Comparison: BM25 baseline on same splits

📊 Evaluation Metrics

NDCG@10 (Normalized Discounted Cumulative Gain)

Measures ranking quality with position-based discounting
Considers graded relevance labels
Primary metric for ranking evaluation

MAP (Mean Average Precision)

Evaluates precision across all relevant items
Emphasizes finding all relevant documents

Precision@K

Measures fraction of relevant items in top-K results
Simple interpretable metric

📁 Project Structure

learning-to-rank-from-scratch/
├── learning_to_rank.ipynb      # Main notebook with complete implementation
├── requirements.txt             # Python dependencies
├── README.md                    # This file
├── .gitignore                  # Git ignore rules
└── ml-100k/                    # MovieLens dataset (auto-downloaded)

📸 Visualizations

The notebook generates three key visualizations:

Metric Comparison by Fold - Shows LambdaMART vs BM25 for each CV fold
Average Metric Comparison - Mean performance with error bars
Feature Importance - Top contributing features to ranking quality

🎓 Key Concepts

Learning-to-Rank

Learning-to-Rank treats ranking as a supervised machine learning problem:

Input: Query-document pairs with features
Output: Relevance scores for ranking
Approaches: Pointwise, Pairwise (this project), Listwise

LambdaMART

LambdaMART combines:

LambdaRank: Uses lambda gradients from pairwise preferences
MART (Multiple Additive Regression Trees): Gradient boosted decision trees
Direct NDCG optimization: Optimizes the actual ranking metric

Why Pairwise Learning?

More data efficient than pointwise approaches
Captures relative ordering directly
Better suited for ranking tasks than regression

🔬 Expected Results

LambdaMART typically outperforms BM25 baseline by:

NDCG@10: 10-30% improvement
MAP: 15-25% improvement
Precision@10: 10-20% improvement

Results may vary based on:

Train/test split
Feature engineering quality
Hyperparameter tuning
Dataset characteristics

🛠️ Customization

Adding New Features

Edit the feature engineering section in the notebook:

feature_columns = [
    'your_new_feature',
    # ... existing features
]

Tuning Hyperparameters

Modify the LightGBM parameters:

params = {
    'objective': 'lambdarank',
    'learning_rate': 0.1,  # Adjust
    'num_leaves': 63,       # Adjust
    # ...
}

Using Different Datasets

Replace the MovieLens loading code with your dataset:

Ensure query-document-relevance triplet format
Adapt feature engineering to your domain

📚 References

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Feel free to:

Report bugs
Suggest features
Submit pull requests
Improve documentation

⭐ Acknowledgments

GroupLens Research for the MovieLens dataset
Microsoft Research for LambdaMART algorithm
LightGBM team for the excellent gradient boosting framework

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Dec 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ltr_lib-0.1.0.tar.gz (48.0 kB view details)

Uploaded Dec 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ltr_lib-0.1.0-py3-none-any.whl (25.2 kB view details)

Uploaded Dec 14, 2025 Python 3

File details

Details for the file ltr_lib-0.1.0.tar.gz.

File metadata

Download URL: ltr_lib-0.1.0.tar.gz
Upload date: Dec 14, 2025
Size: 48.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ltr_lib-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d95a6d8dfa32be6418a6bdefdf582e70683835f9f6f26ae63050055a3b73e62a`
MD5	`52a4a9d0697301ceb3636d16e1c95ff9`
BLAKE2b-256	`d663afe8f945cf0a6a8f94192fd039acf3f32a72b73df6c30e523ca877e0b7d8`

See more details on using hashes here.

File details

Details for the file ltr_lib-0.1.0-py3-none-any.whl.

File metadata

Download URL: ltr_lib-0.1.0-py3-none-any.whl
Upload date: Dec 14, 2025
Size: 25.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ltr_lib-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5af452d0bd72a3c3d7fe1b6153542903509d2f7a4bfea5aff408ac8512c0faba`
MD5	`cb78a57ed504abe3d56ef752c0ae854b`
BLAKE2b-256	`0aae0cec11e1b319fd9cfb4d71b6e387e6ad5ce1acc8794b7c4f3ee9b89dd4b1`

See more details on using hashes here.

ltr-lib 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Learning-to-Rank from Scratch

🎯 Overview

📊 Dataset

🚀 Quick Start

Prerequisites

Run the Notebook

🔧 Features Engineering

1. TF-IDF Similarity Features

2. Document Popularity Features

3. Engagement Signal Features

📈 Model Architecture

LambdaMART Configuration

Training Strategy

📊 Evaluation Metrics

NDCG@10 (Normalized Discounted Cumulative Gain)

MAP (Mean Average Precision)

Precision@K

📁 Project Structure

📸 Visualizations

🎓 Key Concepts

Learning-to-Rank

LambdaMART

Why Pairwise Learning?

🔬 Expected Results

🛠️ Customization

Adding New Features

Tuning Hyperparameters

Using Different Datasets

📚 References

📝 License

🤝 Contributing

⭐ Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes