Skip to main content

Predictive modeling with BlackSwanClassifier

Project description

Algorithme.ai — BlackSwanClassifier 🦢 (Library API)

Predictive modeling toolkit that talks to Algorithme.ai’s hosted engine through a thin Python client. This repository includes a minimal Python package (algorithmeai) and a quickstart showing how to build, evaluate, improve, and export a model using CSV files.

Heads‑up (privacy/IO): Most calls send your CSV content or individual items to a hosted API (AWS Lambda) and receive results back. Do not send sensitive or personally identifiable data unless you have approval to do so and the data is anonymized.


Contents

BlackSwanClassifier/
├─ algorithmeai/                 # Python client package
│  ├─ __init__.py
│  └─ algorithmeai.py            # BlackSwanClassifier client
├─ quickstart/
│  ├─ quickstart.ipynb           # End-to-end example
│  ├─ train.csv                  # Example training set (binary target in col 0)
│  ├─ backtest.csv               # Example evaluation set
│  └─ blackswan-api.json         # Example saved model handle (hash + log)
├─ pyproject.toml
└─ algorithmeai.egg-info/

Installation

You can install the package locally in editable mode. The client depends on requests.

# from the repo root (this folder)
pip install -e .
pip install requests  # if not already present

Python 3.8+ recommended.


Quickstart (from quickstart/quickstart.ipynb)

Below mirrors the notebook and covers the main workflow:

from algorithmeai import BlackSwanClassifier

# 1) Build a remote model from a CSV (target is at index 0 by default)
model = BlackSwanClassifier("quickstart/train.csv", target_index=0)

# 2) Rehydrate/ping an existing remote model by its 64-char hash
new_model = BlackSwanClassifier("bc0ad6c0d46f32551bda63fb70e6186bdad6cb66bd39958d40a99beee4ae5bde")

# 3) Evaluate AUC on a backtest set
auc = new_model.get_auc("quickstart/backtest.csv")

# 4) Optimize the model (server-side)
new_model.improvePrecision()   # bias toward precision
new_model.improveRecall()      # bias toward recall
new_model.improve()            # balanced improvement

# 5) Re-evaluate
auc = new_model.get_auc("quickstart/backtest.csv")

# 6) Find the optimal threshold and its AUC
auc, opt = new_model.get_auc_opt("quickstart/backtest.csv")

# 7) Inspect global feature importance on a dataset
gfi = new_model.get_global_feature_importance("quickstart/backtest.csv")

# 8) Get per-row confidence scores and filter indexes above a threshold
conf = new_model.get_confidence("quickstart/backtest.csv")
idx = new_model.filter("quickstart/backtest.csv", opt)  # opt from step 6

# 9) Work with a "population" of sample items for item-level exploration
population = new_model.make_population("quickstart/backtest.csv")
item = population[0]  # pick one item

# Item-level introspection
fi_item = new_model.get_feature_importance(item)   # feature contributions for this item
conf_item = new_model.get_item_confidence(item)    # confidence for this item
audit = new_model.get_audit(item)                  # returns a dictionary audit with lookalikes csv confidence and feature importance for this item

# 10) Export a portable handle and re-load later
new_model.to_json("quickstart/blackswan-api.json")
final_model = BlackSwanClassifier("quickstart/blackswan-api.json")
print(final_model.log)  # server log / trace from last call

Notes:

  • Most methods update both self.hash (the remote model handle) and self.log (a textual server log).

Data format

  • CSV files with a header row.
  • Binary target column by default at index 0 (0/1). You can choose a different target via the target_index argument.
  • Other columns are treated as features. Numeric floats/ints are supported; 0/1 columns work as booleans.
  • You can exclude features by their positional index: excluded_features_index=[...] when constructing the classifier.

Example (from quickstart/train.csv and backtest.csv):

Diagnosis,Age,Gender,BMI,Smoking,GeneticRisk,PhysicalActivity,AlcoholIntake,CancerHistory
1,58,1,16.0853,0,1,8.1463,4.1482,1
0,71,0,30.8288,0,1,9.3616,3.1983,0
...

Python API

All calls below contact the hosted service and may take time depending on data size and network.

Constructor

BlackSwanClassifier(filepath, target_index=0, excluded_features_index=[])
  • filepath accepts one of:
    • Path to a CSV file → builds/initializes a remote model.
    • A 64-character hash string → attaches to an existing remote model.
    • Path to a JSON file created by to_json → reloads a saved handle.
  • Side-effects: sets/updates .hash and .log.

Model lifecycle

  • improve() → server-side balanced improvement; updates .hash, .log.
  • improvePrecision() → bias toward precision; updates .hash, .log.
  • improveRecall() → bias toward recall; updates .hash, .log.

Evaluation

  • get_auc(csv_path) -> float
    Returns AUC on the provided dataset.
  • get_auc_opt(csv_path) -> (float auc, float opt)
    Returns the AUC and the optimal decision threshold used to achieve it.

Introspection (dataset-level)

  • get_global_feature_importance(csv_path) -> dict[str, float]
    Global feature importance computed over the given dataset.
  • get_confidence(csv_path) -> dict[int, float]
    Per-row confidence (keyed by row index).
  • filter(csv_path, opt=0.5) -> list[int]
    Convenience wrapper that returns indexes whose confidence ≥ opt.

Introspection (item-level)

  • make_population(csv_path) -> list[item]
    Returns a collection of item payloads suitable for per-item analysis.
  • get_feature_importance(item) -> dict[str, float]
    Feature contributions for a specific item.
  • get_item_confidence(item) -> float
    Confidence for a specific item.
  • get_audit(item) -> None
    Retrieves and logs an audit trail for the item (printed, and appended to self.log).

Persistence

  • to_json(fileout="blackswan-api.json") -> None
    Writes a minimal JSON with the current hash and log. You can later rehydrate with BlackSwanClassifier(fileout).

Networking, privacy & security

  • The client sends your CSV text and/or item payloads to a hosted endpoint (AWS Lambda) and receives JSON responses.
  • Do not include PII or sensitive data unless it is anonymized and you have permission to process it off-device.
  • Keep your blackswan-api.json and model hash private if your project requires access control—possession of the hash re-attaches to the remote model.

Troubleshooting

  • ModuleNotFoundError: No module named 'requests'
    Install it: pip install requests.

  • Network/HTTP errors (non-200, timeouts):
    Check connectivity, retry, and inspect print output and final_model.log for server-side messages.

  • Unexpected results / wrong target column:
    Make sure the target is 0/1 and pass the correct target_index when building the model.

  • CSV parsing issues:
    Ensure UTF-8 encoding and a single header row. Avoid stray commas/quotes.


Development

  • Code style is simple and dependency-light. Contributions that improve robustness (typing, retries, docstrings) are welcome.
  • Before opening PRs that change network protocols or add telemetry, please open an issue to discuss.

License

MIT Licence. Assume all rights reserved © Algorithme.ai / Charles Dana (2025). For commercial use, contact the author.


Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

algorithmeai-1.0.0.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

algorithmeai-1.0.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file algorithmeai-1.0.0.tar.gz.

File metadata

  • Download URL: algorithmeai-1.0.0.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for algorithmeai-1.0.0.tar.gz
Algorithm Hash digest
SHA256 fba32fc88d690fa557e91b271027d708ac3ba14de8282cd5f993ac88d99263a9
MD5 ecb95166990ff1a29676c6edc4e2983e
BLAKE2b-256 d53b24f09fe96c11499004044a522bfe935c34691d090f808ad4bc5f91c5b3ed

See more details on using hashes here.

File details

Details for the file algorithmeai-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: algorithmeai-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for algorithmeai-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 211e94295be2a3310476a081d726c582314ec379635b747b44a5524d233a5b1f
MD5 35c4c438d4a68abebe34f912ef03e5dd
BLAKE2b-256 abdee77331cc8b1560f86f2d4256c2020ced02bd34a81565affac0552b60161b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page