Predictive modeling with BlackSwanClassifier
Project description
Algorithme.ai — BlackSwanClassifier 🦢 (Library API)
Predictive modeling toolkit that talks to Algorithme.ai’s hosted engine through a thin Python client. This repository includes a minimal Python package (algorithmeai) and a quickstart showing how to build, evaluate, improve, and export a model using CSV files.
Heads‑up (privacy/IO): Most calls send your CSV content or individual items to a hosted API (AWS Lambda) and receive results back. Do not send sensitive or personally identifiable data unless you have approval to do so and the data is anonymized.
Contents
BlackSwanClassifier/
├─ algorithmeai/ # Python client package
│ ├─ __init__.py
│ └─ algorithmeai.py # BlackSwanClassifier client
├─ quickstart/
│ ├─ quickstart.ipynb # End-to-end example
│ ├─ train.csv # Example training set (binary target in col 0)
│ ├─ backtest.csv # Example evaluation set
│ └─ blackswan-api.json # Example saved model handle (hash + log)
├─ pyproject.toml
└─ algorithmeai.egg-info/
Installation
You can install the package locally in editable mode. The client depends on requests.
# from the repo root (this folder)
pip install -e .
pip install requests # if not already present
Python 3.8+ recommended.
Quickstart (from quickstart/quickstart.ipynb)
Below mirrors the notebook and covers the main workflow:
from algorithmeai import BlackSwanClassifier
# 1) Build a remote model from a CSV (target is at index 0 by default)
model = BlackSwanClassifier("quickstart/train.csv", target_index=0)
# 2) Rehydrate/ping an existing remote model by its 64-char hash
new_model = BlackSwanClassifier("bc0ad6c0d46f32551bda63fb70e6186bdad6cb66bd39958d40a99beee4ae5bde")
# 3) Evaluate AUC on a backtest set
auc = new_model.get_auc("quickstart/backtest.csv")
# 4) Optimize the model (server-side)
new_model.improvePrecision() # bias toward precision
new_model.improveRecall() # bias toward recall
new_model.improve() # balanced improvement
# 5) Re-evaluate
auc = new_model.get_auc("quickstart/backtest.csv")
# 6) Find the optimal threshold and its AUC
auc, opt = new_model.get_auc_opt("quickstart/backtest.csv")
# 7) Inspect global feature importance on a dataset
gfi = new_model.get_global_feature_importance("quickstart/backtest.csv")
# 8) Get per-row confidence scores and filter indexes above a threshold
conf = new_model.get_confidence("quickstart/backtest.csv")
idx = new_model.filter("quickstart/backtest.csv", opt) # opt from step 6
# 9) Work with a "population" of sample items for item-level exploration
population = new_model.make_population("quickstart/backtest.csv")
item = population[0] # pick one item
# Item-level introspection
fi_item = new_model.get_feature_importance(item) # feature contributions for this item
conf_item = new_model.get_item_confidence(item) # confidence for this item
audit = new_model.get_audit(item) # returns a dictionary audit with lookalikes csv confidence and feature importance for this item
# 10) Export a portable handle and re-load later
new_model.to_json("quickstart/blackswan-api.json")
final_model = BlackSwanClassifier("quickstart/blackswan-api.json")
print(final_model.log) # server log / trace from last call
Notes:
- Most methods update both
self.hash(the remote model handle) andself.log(a textual server log).
Data format
- CSV files with a header row.
- Binary target column by default at index 0 (
0/1). You can choose a different target via thetarget_indexargument. - Other columns are treated as features. Numeric floats/ints are supported; 0/1 columns work as booleans.
- You can exclude features by their positional index:
excluded_features_index=[...]when constructing the classifier.
Example (from quickstart/train.csv and backtest.csv):
Diagnosis,Age,Gender,BMI,Smoking,GeneticRisk,PhysicalActivity,AlcoholIntake,CancerHistory
1,58,1,16.0853,0,1,8.1463,4.1482,1
0,71,0,30.8288,0,1,9.3616,3.1983,0
...
Python API
All calls below contact the hosted service and may take time depending on data size and network.
Constructor
BlackSwanClassifier(filepath, target_index=0, excluded_features_index=[])
filepathaccepts one of:- Path to a CSV file → builds/initializes a remote model.
- A 64-character hash string → attaches to an existing remote model.
- Path to a JSON file created by
to_json→ reloads a saved handle.
- Side-effects: sets/updates
.hashand.log.
Model lifecycle
improve()→ server-side balanced improvement; updates.hash,.log.improvePrecision()→ bias toward precision; updates.hash,.log.improveRecall()→ bias toward recall; updates.hash,.log.
Evaluation
get_auc(csv_path) -> float
Returns AUC on the provided dataset.get_auc_opt(csv_path) -> (float auc, float opt)
Returns the AUC and the optimal decision threshold used to achieve it.
Introspection (dataset-level)
get_global_feature_importance(csv_path) -> dict[str, float]
Global feature importance computed over the given dataset.get_confidence(csv_path) -> dict[int, float]
Per-row confidence (keyed by row index).filter(csv_path, opt=0.5) -> list[int]
Convenience wrapper that returns indexes whose confidence ≥opt.
Introspection (item-level)
make_population(csv_path) -> list[item]
Returns a collection of item payloads suitable for per-item analysis.get_feature_importance(item) -> dict[str, float]
Feature contributions for a specific item.get_item_confidence(item) -> float
Confidence for a specific item.get_audit(item) -> None
Retrieves and logs an audit trail for the item (printed, and appended toself.log).
Persistence
to_json(fileout="blackswan-api.json") -> None
Writes a minimal JSON with the currenthashandlog. You can later rehydrate withBlackSwanClassifier(fileout).
Networking, privacy & security
- The client sends your CSV text and/or item payloads to a hosted endpoint (AWS Lambda) and receives JSON responses.
- Do not include PII or sensitive data unless it is anonymized and you have permission to process it off-device.
- Keep your
blackswan-api.jsonand model hash private if your project requires access control—possession of the hash re-attaches to the remote model.
Troubleshooting
-
ModuleNotFoundError: No module named 'requests'
Install it:pip install requests. -
Network/HTTP errors (non-200, timeouts):
Check connectivity, retry, and inspectprintoutput andfinal_model.logfor server-side messages. -
Unexpected results / wrong target column:
Make sure the target is 0/1 and pass the correcttarget_indexwhen building the model. -
CSV parsing issues:
Ensure UTF-8 encoding and a single header row. Avoid stray commas/quotes.
Development
- Code style is simple and dependency-light. Contributions that improve robustness (typing, retries, docstrings) are welcome.
- Before opening PRs that change network protocols or add telemetry, please open an issue to discuss.
License
MIT Licence. Assume all rights reserved © Algorithme.ai / Charles Dana (2025). For commercial use, contact the author.
Contact
- Author: Charles Dana — charles@algorithme.ai
- Product: BlackSwanClassifier (Algorithme.ai)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file algorithmeai-1.0.0.tar.gz.
File metadata
- Download URL: algorithmeai-1.0.0.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fba32fc88d690fa557e91b271027d708ac3ba14de8282cd5f993ac88d99263a9
|
|
| MD5 |
ecb95166990ff1a29676c6edc4e2983e
|
|
| BLAKE2b-256 |
d53b24f09fe96c11499004044a522bfe935c34691d090f808ad4bc5f91c5b3ed
|
File details
Details for the file algorithmeai-1.0.0-py3-none-any.whl.
File metadata
- Download URL: algorithmeai-1.0.0-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
211e94295be2a3310476a081d726c582314ec379635b747b44a5524d233a5b1f
|
|
| MD5 |
35c4c438d4a68abebe34f912ef03e5dd
|
|
| BLAKE2b-256 |
abdee77331cc8b1560f86f2d4256c2020ced02bd34a81565affac0552b60161b
|