Skip to main content

Tools for ML projects and data management

Project description

ML Analytics Tools

Utilities for common analytics and machine learning workflows: Redshift, S3, Google Sheets, Slack, MLflow, model evaluation, and SQL pipelines.

The package is intentionally infrastructure-neutral. Buckets, credentials, MLflow hosts, and tokens are provided by your environment or by explicit arguments.

What Is Included

  • DataConnector: run Redshift SQL, load SQL files, unload/load data through S3, and create Redshift tables from DataFrames.
  • S3Connector: read, write, list, delete, and query S3 data with DuckDB.
  • GSheet: read, write, share, and export Google Sheets data.
  • SlackConnector: send messages, upload files, and manage simple Slack interactions.
  • ModelManager: create MLflow experiments, log models, register versions, manage aliases, and handle permissions.
  • model_tools: classification, regression, survival analysis, CatBoost helpers, plotting, and reporting utilities.
  • utils: project-root discovery, SQL file loading, logging, credentials, and YAML SQL pipelines.

Install

From PyPI, after a release is available:

uv add ml-analytics-tools

Directly from GitHub:

uv add git+https://github.com/sdaza/ml-analytics-tools

For local development:

uv sync --all-groups

Configuration

The package loads a .env file from the project root when it is imported. Only configure the services you use.

# Redshift
BI_REDSHIFT_HOST=redshift-cluster.example.com
BI_REDSHIFT_DB=analytics
BI_REDSHIFT_USER=analytics_user
BI_REDSHIFT_PASSWORD=secret
BI_REDSHIFT_PORT=5439

# S3
ML_ANALYTICS_S3_BUCKET=my-analytics-bucket

# MLflow
MLFLOW_TRACKING_URI=https://mlflow.example.com
MLFLOW_TRACKING_USERNAME=user@example.com
MLFLOW_TRACKING_PASSWORD=secret

# Google Sheets
GSHEET_SPREADSHEET_ID=optional-default-sheet-id
GOOGLE_CREDENTIALS='{"type":"service_account", ...}'

# Slack
SLACK_BOT_TOKEN=xoxb-your-token

S3 buckets are never hard-coded. Pass bucket=... or s3_bucket=..., or set ML_ANALYTICS_S3_BUCKET.

AWS Authentication

Use the CLI helper for AWS SSO:

ml-analytics-auth

You can also call it from Python:

from ml_analytics import ensure_aws_authenticated

ensure_aws_authenticated()

See AWS Authentication and CLI Commands for details.

Quick Examples

Query Redshift

from ml_analytics import DataConnector

dc = DataConnector()

df = dc.sql("SELECT * FROM analytics.customer_features LIMIT 100")
df_polars = dc.sql("queries/features.sql", format="polars", country="es")

Create A Redshift Table From A DataFrame

dc.create_table_from_dataframe(
    df,
    table="model_scores",
    schema="analytics",
    drop_existing_table=True,
)

Work With S3

from ml_analytics import S3Connector

s3 = S3Connector(bucket="my-analytics-bucket", s3_root="projects/churn")

s3.save_dataframe(df, directory="outputs", file_name="scores")

summary = s3.query(
    """
    SELECT segment, count(*) AS rows
    FROM read_parquet('s3://my-analytics-bucket/projects/churn/outputs/*.parquet')
    GROUP BY segment
    """
)

Read And Write Google Sheets

from ml_analytics import GSheet

gsheet = GSheet(credentials_path="gsheet_credentials.json")

df = gsheet.read_sheet(spreadsheet_id="...", sheet_name="Input")
gsheet.write_sheet(df, spreadsheet_id="...", sheet_name="Results")

Log To MLflow

from ml_analytics import ModelManager

manager = ModelManager(model_name="churn-model", user="user@example.com")

manager.start_run("training")
manager.log_metric("auc", 0.91)
manager.end_run()

Send A Slack Message

from ml_analytics import SlackConnector

slack = SlackConnector()
slack.send_message(channel="#ml-alerts", text="Training finished")

Detailed Guides

Guide Use It For
AWS Authentication AWS SSO setup and Python helpers
CLI Commands Available console commands
Google Sheets Sheets setup, sharing, exports, and examples
Slack Slack token setup and message/file examples
Tunnel Manager SSH tunnel configuration and CLI usage

Development

Run the standard checks before opening a PR:

uv run ruff check
uv run pytest

CI runs Ruff and pytest on Python 3.11 and 3.12.

Releases

This repository uses Release Please. Conventional commits on main create or update a release PR with the next version and changelog. When that PR is merged, the release workflow builds the package and publishes it to PyPI through Trusted Publishing using the pypi GitHub environment.

Contributing

Keep changes small, covered by tests when behavior changes, and free of environment-specific defaults. Prefer explicit configuration over hidden infrastructure assumptions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_analytics_tools-0.2.0.tar.gz (109.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ml_analytics_tools-0.2.0-py3-none-any.whl (87.9 kB view details)

Uploaded Python 3

File details

Details for the file ml_analytics_tools-0.2.0.tar.gz.

File metadata

  • Download URL: ml_analytics_tools-0.2.0.tar.gz
  • Upload date:
  • Size: 109.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ml_analytics_tools-0.2.0.tar.gz
Algorithm Hash digest
SHA256 333a9f3ac5fd408aba79f59271dfcbda5401268c7cb5ac6bd7ecdc3818160ad4
MD5 4371ad9bae139f304df058784e46a9ae
BLAKE2b-256 3efc7146e8f854766eb8f90123bb79bd44af7d2c3ed7d4c395e45d7888fb379f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ml_analytics_tools-0.2.0.tar.gz:

Publisher: release-please.yml on sdaza/ml-analytics-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ml_analytics_tools-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ml_analytics_tools-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cf47579b07d70f4916d119282c3a3a8b4947b159351c1caa9d72008bf5bf7402
MD5 f3152ea6a991cc24c9fe68609aafe28e
BLAKE2b-256 22a6d2571f66de6671bdacb6664328d187a8b1668e81d56131f7bd8bb51e3328

See more details on using hashes here.

Provenance

The following attestation bundles were made for ml_analytics_tools-0.2.0-py3-none-any.whl:

Publisher: release-please.yml on sdaza/ml-analytics-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page