ML utilities for DSRP Machine Learning Engineering course - movie recommendation pipeline
Project description
DSRP ML Utils
Utility library for ML pipelines in the DSRP Machine Learning Engineering course.
Installation
pip install dsrp-ml-utils
With Azure storage support:
pip install dsrp-ml-utils[azure]
Quick Start
from dsrp_ml_utils import (
load_imdb_database,
add_derived_features,
extract_top_genres,
normalize_embeddings,
)
# Load and prepare data
movies = load_imdb_database("data/movies_base.parquet", "data/omdb_raw.jsonl")
movies = add_derived_features(movies)
# Extract metadata
genres = extract_top_genres(movies, top_n=10)
Features
Data Loading
load_imdb_database()- Load and combine IMDB data with OMDB enrichmentadd_derived_features()- Add computed features (log votes, normalized year, etc.)
Metadata Extraction
extract_top_genres()- Get most frequent genresextract_decades()- Get decades present in dataset
Query Generation
generate_template_queries()- Generate synthetic queries for LTR training
Candidate Retrieval
normalize_embeddings()- L2 normalize embeddings for cosine similarityget_candidates_for_query()- Retrieve top-K candidate movies
Relevance Scoring
compute_relevance_score()- Calculate relevance scores with adjustable emphasisassign_relevance_labels()- Convert continuous scores to discrete labels
MLflow Integration
search_best_model()- Search for best run by metricget_artifact_uri_production()- Get production model artifact URI
Azure Storage (optional)
upload_to_blob()/download_from_blob()- File operationssync_to_azure()/sync_from_azure()- Batch sync operationsblob_exists()/list_blobs()- Storage queries
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dsrp_ml_utils-0.1.0.tar.gz
(8.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dsrp_ml_utils-0.1.0.tar.gz.
File metadata
- Download URL: dsrp_ml_utils-0.1.0.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86f859d4d510bad28eb75a72f062d45945ff298728ee17067b7380ac050b8229
|
|
| MD5 |
077eedcc14e1534af375085ce02c221c
|
|
| BLAKE2b-256 |
f01e7a67782045350c601c95763c4c777c66def968609604e3143605c87729e7
|
File details
Details for the file dsrp_ml_utils-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dsrp_ml_utils-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c463e66bea038c2ca5e0541517f00c6d4109035293663454d14c09c20f22751e
|
|
| MD5 |
971f27ef5f816c559a74e6acb4160112
|
|
| BLAKE2b-256 |
2ec186eeeea9f5a2cd15879b87dfd32e8309cf85d7d66a8f1d58d7e5c8ac483e
|