Minimal Experimentation Platform for AI/ML Apps
Project description
xplat
This CLI tool provides configuration management capabilities with a clean, modern interface built using Typer and Rich.
Get Started
See the Docs page: https://gpadpoll.github.io/fixed-income-fund-recsys/
Specific Docs below
Features
- Configuration Management: Set, get, list, and reset configuration values
- Type-aware Values: Automatic conversion of boolean, numeric, and string values
- Rich Formatting: Beautiful table output and colorized messages
- Interactive Prompts: Set values interactively when not provided
- Environment Support: Custom config file location via
XP_CONFIG_PATH - Comprehensive Testing: Full test suite with pytest
- Professional Documentation: Auto-generated docs with MkDocs
Important: Poetry Version
To avoid compatibility errors (such as TypeError related to canonicalize_version), ensure you are using an up-to-date version of Poetry:
pip install --upgrade poetry
If you encounter installation issues, upgrading Poetry usually resolves them.
Quick Start
-
Installation: Install the package in development mode
cd xplat make install
-
Basic Usage: Try the configuration commands
xp config list xp config set theme dark xp config get theme
CLI Commands
Data, feature and model commands ๐ง
This project includes end-to-end commands to fetch raw datasets, compute fund-month features, and compute normalized scores driven by YAML configuration files.
Data commands (fetching and ingestion)
xp data fetch MANIFEST_YAML -d/--output-dir PATH [--ref-date YYYY-MM-DD]- Fetch multiple datasets defined in a manifest YAML file.
- The manifest describes
base_url,periods, andfilename_templatefor each dataset. - The command will download archives (ZIP), extract CSVs, concatenate by dataset and period, and write partitioned dataset files to
output_dir/<dataset>/period=<period>/data.parquet. - If a Parquet engine is not available (no
pyarrow/fastparquet) the command will fall back to writingdata.csvfiles instead. - The fetch adds a
reference_datecolumn (ISO date) to rows indicating when the fetch occurred.
Programmatic helper: xplat.commands.data.fetch_manifest(manifest_dict, output_dir, reference_date=None) returns a dict[str, pandas.DataFrame] mapping dataset names to DataFrames for further programmatic processing.
Example manifest snippet:
fetch:
cda:
base_url: "https://dados.cvm.gov.br/dados/FI/DOC/CDA/DADOS/"
periods: ["202501", "202502"]
filename_template: "cda_fi_{period}.zip"
Feature commands (feature engineering) ๐ง
xp feature build -i INPUT_DIR -d DATASET -o OUTPUT_PATH- Loads partitioned datasets from
INPUT_DIR(supports Parquet partitions and CSV fallbacks). - Computes fund-month features according to a feature registry (defined in YAML or programmatically) using the project's
FEATURE_ENGINE. - Writes the feature table to
OUTPUT_PATH(Parquet preferred, CSV fallback if Parquet engine missing).
- Loads partitioned datasets from
Programmatic helper: compute_all_features(data_sources_d, config_d, FEATURE_ENGINE) returns a DataFrame with computed features.
Feature registry YAML example (simplified):
feature:
group_keys:
- CNPJ_FUNDO_CLASSE
- DENOM_SOCIAL
- competencia
feature_registry:
cda:
patrimonio_liq:
description: "Maximum reported net asset value per fund-month."
method: max
args:
- VL_PATRIM_LIQ
log_aum:
description: "Log-transformed AUM (for size comparisons)."
method: max
args:
- VL_PATRIM_LIQ
adjustment:
- log
credito_share:
description: "Weighted share of credit-linked assets in the portfolio."
method: credito_share_feature_fn
args:
- ["Debรชntures", "Cรฉdula de Crรฉdito", "CRI", "CRA", "Notas Promissรณrias"]
adjustment:
- clip
Notes:
- Methods can be built-in aggregations (e.g.,
sum,max,nunique) or custom feature functions (e.g.,credito_share_feature_fn,hhi_feature_fn). - Adjustments (e.g.,
log,clip,coalesce) are applied after aggregation to normalize or clean values for scoring. - Ensure the
group_keysreflect how you want to aggregate fund-month rows.
Model commands (scoring) ๐ฏ
xp model score -i INPUT_PATH -c CONFIG_YAML -o OUTPUT_PATH- Reads the feature table and scoring YAML config.
- Computes normalized scores (currently
zscore) and applies adjustments such asinvertorcoalesce. - Appends score columns (e.g.,
size_score) to the table and writes the scored table toOUTPUT_PATH.
Programmatic helper: compute_scores_from_yaml(features_df, config_d) returns a DataFrame with added score columns.
Scoring YAML example (simplified):
score:
size_score:
type: zscore
description: "Z-score of `log_aum` to capture fund size (bigger โ better)."
args:
feature: log_aum
credit_risk_score:
type: zscore
description: "Credit exposure inverted (higher credit โ lower score)."
args:
feature: credito_share
adjustment:
- invert
Notes:
typecurrently supportszscore(standardized values). Additional score types can be added as required.args.featurepoints to the feature column to be scored (e.g.,log_aum,n_ativos).adjustmententries are applied after computing the raw score (e.g.,invertflips the sign so higher raw means lower score).- After scoring, you can compute profile-level aggregations with
xp policy profile-score(see below).
Policy commands (profile scoring & ranking)
xp policy profile-score -i INPUT_PATH -c CONFIG_YAML -o OUTPUT_PATH- Loads a scored feature table and a YAML file containing profile definitions (either top-level
profile:orprofiles:). - Computes weighted profile scores by summing feature score columns multiplied by weights defined in each profile.
- Appends
score_<profile>andrank_<profile>columns (dense ranking: best score โ rank 1) to the table and writes the result toOUTPUT_PATH(Parquet preferred, CSV fallback available).
- Loads a scored feature table and a YAML file containing profile definitions (either top-level
Programmatic helpers: compute_profile_scores_from_yaml(features_df, config_d) and compute_profile_scores_from_df(features_df, profiles) are available for programmatic usage in notebooks and scripts.
Example profile YAML snippet:
profile:
conservative:
size_score: 0.25
diversification_score: 0.20
balanced:
size_score: 0.20
diversification_score: 0.15
Notes & tips ๐ก
- Dependencies:
pandasis required;pyarroworfastparquetare optional but recommended for efficient Parquet IO. - Tests: The test suite uses CSV fallbacks to avoid requiring Parquet dependencies in CI.
- Reproducibility: Use
--ref-date(or passreference_dateprogrammatically) to produce deterministic fetch outputs.
Development
Prerequisites
This project uses Poetry for dependency management. Install Poetry first:
curl -sSL https://install.python-poetry.org | python3 -
Setup
-
Install dependencies:
make installThis installs the package and all development dependencies using Poetry.
-
Install pre-commit hooks:
make pre-commit
Testing
Run the comprehensive test suite:
make test
Or run tests directly with Poetry:
poetry run pytest -vvv
Tests cover:
- All config subcommands (set, get, list, reset)
- Type conversion (boolean, numeric, string)
- Error handling (missing keys, corrupted files)
- Interactive prompts and confirmations
- Environment variable configuration
- File creation and management
Documentation
-
Install docs dependencies:
make docs -
Serve docs locally:
make serve-docsOr run directly with Poetry:
poetry run mkdocs serve -f docs/mkdocs.yml
-
View documentation: Open http://localhost:8000
Code Quality
- Format code:
make formatorpoetry run black . - Check formatting:
make checkorpoetry run black --check --diff . - Run linting:
poetry run flake8 - Type checking:
poetry run mypy . - Clean artifacts:
make clean
Docker Testing
Test the CLI in a clean container environment:
-
Build image:
make docker-image -
Run commands (example: show help or run specific CLI command):
docker run --rm xp --help docker run --rm xp xp --help
-
Run the full pipeline using a manifest (fetch โ feature โ score โ profile ranking):
Mount your
manifest.yamland an output directory, and run thepipelineentrypoint. For example:mkdir -p /tmp/xp_data docker run --rm \ -v "$(pwd)/manifest.yaml:/manifest.yaml" \ -v "/tmp/xp_data:/data" \ xp pipeline /manifest.yaml /data
The container will execute the following steps in order:
xp data fetch --manifest /manifest.yaml --output-dir /dataxp feature build --input-dir /data --config /manifest.yaml --output /data/features.parquetxp model score --input /data/features.parquet --config /manifest.yaml --output /data/features_scored.parquetxp policy profile-score --input /data/features_scored.parquet --config /manifest.yaml --output /data/features_profile_scored.parquet
Output files will be written into the mounted
/datadirectory on the host.Note: If the container can't write Parquet because
pyarrowisn't installed, CSV fallbacks will be written instead (e.g.,features.csv).
Configuration Storage
- Default location:
~/.xp_config.json - Custom location: Set
XP_CONFIG_PATHenvironment variable - Format: JSON with automatic type preservation
- Default values: Includes theme, output_format, auto_save, and debug settings
Distribution
PyPI Publishing
NOTE: Ensure you have a PyPI account before publishing.
-
Create distributions:
make distributionsThis builds the package using Poetry.
-
Upload to PyPI:
poetry publishOr use twine:
twine upload dist/*
GitHub Actions / Secrets
- Secret name:
PYPI_API_TOKENโ add this in your repository Settings โ Secrets โ Actions. - Create token: Generate a PyPI API token at https://pypi.org/manage/account/token/ (recommended: project-scoped token).
- How it's used: The GitHub Actions workflow reads
PYPI_API_TOKENand passes it totwineas the password (username__token__).
When you push a tag like v1.2.3 or publish a GitHub Release, the workflow will build the distributions and upload them to PyPI using the token.
Package Structure
The generated package includes:
- Clean CLI interface with professional help text
- Comprehensive test coverage for all functionality
- Type-safe configuration handling with automatic conversions
- Rich formatting for beautiful output
- Professional documentation ready for deployment
- Docker support for containerized usage
Project layout (directory tree)
A concise view of the repository layout (truncated) to help you locate commands, modules, and tests:
.
โโโ core.py # high-level package helpers and CLI entrypoints
โโโ Dockerfile # container image build steps for testing/deployment
โโโ Makefile # convenience commands (install, test, docs, etc.)
โโโ pyproject.toml # project metadata and dependencies (Poetry)
โโโ README.md # this file
โโโ docs/ # MkDocs site and notebook resources
โ โโโ mkdocs.yml # docs configuration
โ โโโ docs/
โ โโโ notebooks/ # example notebooks and tutorials
โโโ notebooks/ # interactive notebooks (examples, experiments)
โ โโโ example.ipynb
โโโ etc/ # auxiliary scripts and sample artifacts
โ โโโ artifact.py
โ โโโ dump.py
โโโ xplat/ # main package code
โ โโโ __init__.py
โ โโโ constants.py # global constants and settings
โ โโโ main.py # top-level Typer app and command registration
โ โโโ utils.py # reusable helpers (I/O, parsing, small utilities)
โ โโโ commands/ # CLI command implementations (Typer)
โ โโโ __init__.py
โ โโโ config.py # configuration management commands
โ โโโ data.py # data download & ingestion (fetch/manifest)
โ โโโ feature.py # feature engineering pipeline and registry
โ โโโ model.py # scoring logic and CLI commands
โ โโโ policy.py # profile scoring and ranking commands
โโโ tests/ # test suite (pytest)
โโโ test_config.py
โโโ test_feature.py
โโโ test_fetch.py
โโโ test_model.py
โโโ test_policy.py
Use this tree as a quick reference: command implementations live in xplat/commands/*, reusable logic in xplat/ top-level modules, and tests in tests/.
Architecture
Built with modern Python CLI best practices:
- Poetry - Modern dependency management
- Typer - Type-based CLI framework
- Rich - Beautiful terminal output
- Pytest - Reliable testing framework
- MkDocs - Professional documentation
- Black - Code formatting
- Pre-commit - Git hooks for quality
Help
View all available make commands:
make help
Get CLI help:
xp --help
xp config --help
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xplat-0.0.1.tar.gz.
File metadata
- Download URL: xplat-0.0.1.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d9539d2cd6d2fbfb42a48919977ae39a97694ab84ced74851acd961b1654ca8
|
|
| MD5 |
b4c528e53dba9fb4a5f9543a387456df
|
|
| BLAKE2b-256 |
7be2eaf0f51002cba12f13a05471867c77db24d5a08a1cc2daf19d84ed69fe4d
|
File details
Details for the file xplat-0.0.1-py3-none-any.whl.
File metadata
- Download URL: xplat-0.0.1-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc94c91ed0fe9d06257fd7ff943e735f4a084a9a1926dec0e08d003772e00e42
|
|
| MD5 |
feaf9dd8f0678dd38a2cb69499e4fc13
|
|
| BLAKE2b-256 |
d5b0071f908f7fa304b17afd4bb258587e244e785030cfec07a2577331c49ff4
|