Skip to main content

Minimal Experimentation Platform for AI/ML Apps

Project description

xplat

This CLI tool provides configuration management capabilities with a clean, modern interface built using Typer and Rich.

Get Started

See the Docs page: https://gpadpoll.github.io/fixed-income-fund-recsys/


Specific Docs below

Features

  • Configuration Management: Set, get, list, and reset configuration values
  • Type-aware Values: Automatic conversion of boolean, numeric, and string values
  • Rich Formatting: Beautiful table output and colorized messages
  • Interactive Prompts: Set values interactively when not provided
  • Environment Support: Custom config file location via XP_CONFIG_PATH
  • Comprehensive Testing: Full test suite with pytest
  • Professional Documentation: Auto-generated docs with MkDocs

Important: Poetry Version

To avoid compatibility errors (such as TypeError related to canonicalize_version), ensure you are using an up-to-date version of Poetry:

pip install --upgrade poetry

If you encounter installation issues, upgrading Poetry usually resolves them.

Quick Start

  1. Installation: Install the package in development mode

    cd xplat
    make install
    
  2. Basic Usage: Try the configuration commands

    xp config list
    xp config set theme dark
    xp config get theme
    

CLI Commands

Data, feature and model commands ๐Ÿ”ง

This project includes end-to-end commands to fetch raw datasets, compute fund-month features, and compute normalized scores driven by YAML configuration files.

Data commands (fetching and ingestion)

  • xp data fetch MANIFEST_YAML -d/--output-dir PATH [--ref-date YYYY-MM-DD]
    • Fetch multiple datasets defined in a manifest YAML file.
    • The manifest describes base_url, periods, and filename_template for each dataset.
    • The command will download archives (ZIP), extract CSVs, concatenate by dataset and period, and write partitioned dataset files to output_dir/<dataset>/period=<period>/data.parquet.
    • If a Parquet engine is not available (no pyarrow/fastparquet) the command will fall back to writing data.csv files instead.
    • The fetch adds a reference_date column (ISO date) to rows indicating when the fetch occurred.

Programmatic helper: xplat.commands.data.fetch_manifest(manifest_dict, output_dir, reference_date=None) returns a dict[str, pandas.DataFrame] mapping dataset names to DataFrames for further programmatic processing.

Example manifest snippet:

fetch:
  cda:
    base_url: "https://dados.cvm.gov.br/dados/FI/DOC/CDA/DADOS/"
    periods: ["202501", "202502"]
    filename_template: "cda_fi_{period}.zip"

Feature commands (feature engineering) ๐Ÿ”ง

  • xp feature build -i INPUT_DIR -d DATASET -o OUTPUT_PATH
    • Loads partitioned datasets from INPUT_DIR (supports Parquet partitions and CSV fallbacks).
    • Computes fund-month features according to a feature registry (defined in YAML or programmatically) using the project's FEATURE_ENGINE.
    • Writes the feature table to OUTPUT_PATH (Parquet preferred, CSV fallback if Parquet engine missing).

Programmatic helper: compute_all_features(data_sources_d, config_d, FEATURE_ENGINE) returns a DataFrame with computed features.

Feature registry YAML example (simplified):

feature:
  group_keys:
    - CNPJ_FUNDO_CLASSE
    - DENOM_SOCIAL
    - competencia
  feature_registry:
    cda:
      patrimonio_liq:
        description: "Maximum reported net asset value per fund-month."
        method: max
        args:
          - VL_PATRIM_LIQ

      log_aum:
        description: "Log-transformed AUM (for size comparisons)."
        method: max
        args:
          - VL_PATRIM_LIQ
        adjustment:
          - log

      credito_share:
        description: "Weighted share of credit-linked assets in the portfolio."
        method: credito_share_feature_fn
        args:
          - ["Debรชntures", "Cรฉdula de Crรฉdito", "CRI", "CRA", "Notas Promissรณrias"]
        adjustment:
          - clip

Notes:

  • Methods can be built-in aggregations (e.g., sum, max, nunique) or custom feature functions (e.g., credito_share_feature_fn, hhi_feature_fn).
  • Adjustments (e.g., log, clip, coalesce) are applied after aggregation to normalize or clean values for scoring.
  • Ensure the group_keys reflect how you want to aggregate fund-month rows.

Model commands (scoring) ๐ŸŽฏ

  • xp model score -i INPUT_PATH -c CONFIG_YAML -o OUTPUT_PATH
    • Reads the feature table and scoring YAML config.
    • Computes normalized scores (currently zscore) and applies adjustments such as invert or coalesce.
    • Appends score columns (e.g., size_score) to the table and writes the scored table to OUTPUT_PATH.

Programmatic helper: compute_scores_from_yaml(features_df, config_d) returns a DataFrame with added score columns.

Scoring YAML example (simplified):

score:
  size_score:
    type: zscore
    description: "Z-score of `log_aum` to capture fund size (bigger โ†’ better)."
    args:
      feature: log_aum

  credit_risk_score:
    type: zscore
    description: "Credit exposure inverted (higher credit โ†’ lower score)."
    args:
      feature: credito_share
    adjustment:
      - invert

Notes:

  • type currently supports zscore (standardized values). Additional score types can be added as required.
  • args.feature points to the feature column to be scored (e.g., log_aum, n_ativos).
  • adjustment entries are applied after computing the raw score (e.g., invert flips the sign so higher raw means lower score).
  • After scoring, you can compute profile-level aggregations with xp policy profile-score (see below).

Policy commands (profile scoring & ranking)

  • xp policy profile-score -i INPUT_PATH -c CONFIG_YAML -o OUTPUT_PATH
    • Loads a scored feature table and a YAML file containing profile definitions (either top-level profile: or profiles:).
    • Computes weighted profile scores by summing feature score columns multiplied by weights defined in each profile.
    • Appends score_<profile> and rank_<profile> columns (dense ranking: best score โ†’ rank 1) to the table and writes the result to OUTPUT_PATH (Parquet preferred, CSV fallback available).

Programmatic helpers: compute_profile_scores_from_yaml(features_df, config_d) and compute_profile_scores_from_df(features_df, profiles) are available for programmatic usage in notebooks and scripts.

Example profile YAML snippet:

profile:
  conservative:
    size_score: 0.25
    diversification_score: 0.20
  balanced:
    size_score: 0.20
    diversification_score: 0.15

Notes & tips ๐Ÿ’ก

  • Dependencies: pandas is required; pyarrow or fastparquet are optional but recommended for efficient Parquet IO.
  • Tests: The test suite uses CSV fallbacks to avoid requiring Parquet dependencies in CI.
  • Reproducibility: Use --ref-date (or pass reference_date programmatically) to produce deterministic fetch outputs.

Development

Prerequisites

This project uses Poetry for dependency management. Install Poetry first:

curl -sSL https://install.python-poetry.org | python3 -

Setup

  1. Install dependencies:

    make install
    

    This installs the package and all development dependencies using Poetry.

  2. Install pre-commit hooks:

    make pre-commit
    

Testing

Run the comprehensive test suite:

make test

Or run tests directly with Poetry:

poetry run pytest -vvv

Tests cover:

  • All config subcommands (set, get, list, reset)
  • Type conversion (boolean, numeric, string)
  • Error handling (missing keys, corrupted files)
  • Interactive prompts and confirmations
  • Environment variable configuration
  • File creation and management

Documentation

  1. Install docs dependencies:

    make docs
    
  2. Serve docs locally:

    make serve-docs
    

    Or run directly with Poetry:

    poetry run mkdocs serve -f docs/mkdocs.yml
    
  3. View documentation: Open http://localhost:8000

Code Quality

  • Format code: make format or poetry run black .
  • Check formatting: make check or poetry run black --check --diff .
  • Run linting: poetry run flake8
  • Type checking: poetry run mypy .
  • Clean artifacts: make clean

Docker Testing

Test the CLI in a clean container environment:

  1. Build image:

    make docker-image
    
  2. Run commands (example: show help or run specific CLI command):

    docker run --rm xp --help
    docker run --rm xp xp --help
    
  3. Run the full pipeline using a manifest (fetch โ†’ feature โ†’ score โ†’ profile ranking):

    Mount your manifest.yaml and an output directory, and run the pipeline entrypoint. For example:

    mkdir -p /tmp/xp_data
    docker run --rm \
      -v "$(pwd)/manifest.yaml:/manifest.yaml" \
      -v "/tmp/xp_data:/data" \
      xp pipeline /manifest.yaml /data
    

    The container will execute the following steps in order:

    • xp data fetch --manifest /manifest.yaml --output-dir /data
    • xp feature build --input-dir /data --config /manifest.yaml --output /data/features.parquet
    • xp model score --input /data/features.parquet --config /manifest.yaml --output /data/features_scored.parquet
    • xp policy profile-score --input /data/features_scored.parquet --config /manifest.yaml --output /data/features_profile_scored.parquet

    Output files will be written into the mounted /data directory on the host.

    Note: If the container can't write Parquet because pyarrow isn't installed, CSV fallbacks will be written instead (e.g., features.csv).

Configuration Storage

  • Default location: ~/.xp_config.json
  • Custom location: Set XP_CONFIG_PATH environment variable
  • Format: JSON with automatic type preservation
  • Default values: Includes theme, output_format, auto_save, and debug settings

Distribution

PyPI Publishing

NOTE: Ensure you have a PyPI account before publishing.

  1. Create distributions:

    make distributions
    

    This builds the package using Poetry.

  2. Upload to PyPI:

    poetry publish
    

    Or use twine:

    twine upload dist/*
    

GitHub Actions / Secrets

  • Secret name: PYPI_API_TOKEN โ€” add this in your repository Settings โ†’ Secrets โ†’ Actions.
  • Create token: Generate a PyPI API token at https://pypi.org/manage/account/token/ (recommended: project-scoped token).
  • How it's used: The GitHub Actions workflow reads PYPI_API_TOKEN and passes it to twine as the password (username __token__).

When you push a tag like v1.2.3 or publish a GitHub Release, the workflow will build the distributions and upload them to PyPI using the token.

Package Structure

The generated package includes:

  • Clean CLI interface with professional help text
  • Comprehensive test coverage for all functionality
  • Type-safe configuration handling with automatic conversions
  • Rich formatting for beautiful output
  • Professional documentation ready for deployment
  • Docker support for containerized usage

Project layout (directory tree)

A concise view of the repository layout (truncated) to help you locate commands, modules, and tests:

.
โ”œโ”€โ”€ core.py                 # high-level package helpers and CLI entrypoints
โ”œโ”€โ”€ Dockerfile              # container image build steps for testing/deployment
โ”œโ”€โ”€ Makefile                # convenience commands (install, test, docs, etc.)
โ”œโ”€โ”€ pyproject.toml          # project metadata and dependencies (Poetry)
โ”œโ”€โ”€ README.md               # this file
โ”œโ”€โ”€ docs/                   # MkDocs site and notebook resources
โ”‚   โ”œโ”€โ”€ mkdocs.yml          # docs configuration
โ”‚   โ””โ”€โ”€ docs/
โ”‚       โ””โ”€โ”€ notebooks/      # example notebooks and tutorials
โ”œโ”€โ”€ notebooks/              # interactive notebooks (examples, experiments)
โ”‚   โ””โ”€โ”€ example.ipynb
โ”œโ”€โ”€ etc/                    # auxiliary scripts and sample artifacts
โ”‚   โ”œโ”€โ”€ artifact.py
โ”‚   โ””โ”€โ”€ dump.py
โ”œโ”€โ”€ xplat/             # main package code
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ constants.py        # global constants and settings
โ”‚   โ”œโ”€โ”€ main.py             # top-level Typer app and command registration
โ”‚   โ”œโ”€โ”€ utils.py            # reusable helpers (I/O, parsing, small utilities)
โ”‚   โ””โ”€โ”€ commands/           # CLI command implementations (Typer)
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ config.py       # configuration management commands
โ”‚       โ”œโ”€โ”€ data.py         # data download & ingestion (fetch/manifest)
โ”‚       โ”œโ”€โ”€ feature.py      # feature engineering pipeline and registry
โ”‚       โ”œโ”€โ”€ model.py        # scoring logic and CLI commands
โ”‚       โ””โ”€โ”€ policy.py       # profile scoring and ranking commands
โ””โ”€โ”€ tests/                  # test suite (pytest)
    โ”œโ”€โ”€ test_config.py
    โ”œโ”€โ”€ test_feature.py
    โ”œโ”€โ”€ test_fetch.py
    โ”œโ”€โ”€ test_model.py
    โ””โ”€โ”€ test_policy.py

Use this tree as a quick reference: command implementations live in xplat/commands/*, reusable logic in xplat/ top-level modules, and tests in tests/.

Architecture

Built with modern Python CLI best practices:

  • Poetry - Modern dependency management
  • Typer - Type-based CLI framework
  • Rich - Beautiful terminal output
  • Pytest - Reliable testing framework
  • MkDocs - Professional documentation
  • Black - Code formatting
  • Pre-commit - Git hooks for quality

Help

View all available make commands:

make help

Get CLI help:

xp --help
xp config --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xplat-0.0.1.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xplat-0.0.1-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file xplat-0.0.1.tar.gz.

File metadata

  • Download URL: xplat-0.0.1.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for xplat-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4d9539d2cd6d2fbfb42a48919977ae39a97694ab84ced74851acd961b1654ca8
MD5 b4c528e53dba9fb4a5f9543a387456df
BLAKE2b-256 7be2eaf0f51002cba12f13a05471867c77db24d5a08a1cc2daf19d84ed69fe4d

See more details on using hashes here.

File details

Details for the file xplat-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: xplat-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for xplat-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bc94c91ed0fe9d06257fd7ff943e735f4a084a9a1926dec0e08d003772e00e42
MD5 feaf9dd8f0678dd38a2cb69499e4fc13
BLAKE2b-256 d5b0071f908f7fa304b17afd4bb258587e244e785030cfec07a2577331c49ff4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page