Skip to main content

ML utilities for DSRP Machine Learning Engineering course - movie recommendation pipeline

Project description

DSRP ML Utils

Utility library for ML pipelines in the DSRP Machine Learning Engineering course.

Installation

pip install dsrp-ml-utils

With Azure storage support:

pip install dsrp-ml-utils[azure]

Quick Start

from dsrp_ml_utils import (
    load_imdb_database,
    add_derived_features,
    extract_top_genres,
    normalize_embeddings,
)

# Load and prepare data
movies = load_imdb_database("data/movies_base.parquet", "data/omdb_raw.jsonl")
movies = add_derived_features(movies)

# Extract metadata
genres = extract_top_genres(movies, top_n=10)

Features

Data Loading

  • load_imdb_database() - Load and combine IMDB data with OMDB enrichment
  • add_derived_features() - Add computed features (log votes, normalized year, etc.)

Metadata Extraction

  • extract_top_genres() - Get most frequent genres
  • extract_decades() - Get decades present in dataset

Query Generation

  • generate_template_queries() - Generate synthetic queries for LTR training

Candidate Retrieval

  • normalize_embeddings() - L2 normalize embeddings for cosine similarity
  • get_candidates_for_query() - Retrieve top-K candidate movies

Relevance Scoring

  • compute_relevance_score() - Calculate relevance scores with adjustable emphasis
  • assign_relevance_labels() - Convert continuous scores to discrete labels

MLflow Integration

  • search_best_model() - Search for best run by metric
  • get_artifact_uri_production() - Get production model artifact URI

Azure Storage (optional)

  • upload_to_blob() / download_from_blob() - File operations
  • sync_to_azure() / sync_from_azure() - Batch sync operations
  • blob_exists() / list_blobs() - Storage queries

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsrp_ml_utils-0.1.0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dsrp_ml_utils-0.1.0-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file dsrp_ml_utils-0.1.0.tar.gz.

File metadata

  • Download URL: dsrp_ml_utils-0.1.0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dsrp_ml_utils-0.1.0.tar.gz
Algorithm Hash digest
SHA256 86f859d4d510bad28eb75a72f062d45945ff298728ee17067b7380ac050b8229
MD5 077eedcc14e1534af375085ce02c221c
BLAKE2b-256 f01e7a67782045350c601c95763c4c777c66def968609604e3143605c87729e7

See more details on using hashes here.

File details

Details for the file dsrp_ml_utils-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dsrp_ml_utils-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dsrp_ml_utils-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c463e66bea038c2ca5e0541517f00c6d4109035293663454d14c09c20f22751e
MD5 971f27ef5f816c559a74e6acb4160112
BLAKE2b-256 2ec186eeeea9f5a2cd15879b87dfd32e8309cf85d7d66a8f1d58d7e5c8ac483e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page