AI models based on AIRCHECK data
Project description
Welcome file Welcome file
aircheck_model
aircheck_model is a Python package for training and screening machine learning models on chemical compound datasets.
It provides a Python API (simple train and screen functions) and a Command-Line Interface (CLI) for easy integration in pipelines or local workflows.
The package is designed to work with molecular fingerprints (e.g., ECFP) and chemical structure data in formats such as CSV or Parquet.
✨ Features
- Train ML models with training and optional test datasets
- Save trained models to a specified directory
- Evaluate models on test datasets
- Screen new compounds using trained models
- Simple CLI powered by Typer
📦 Installation
Install from PyPI (once published):
pip install aircheck-test-model
Or install locally for development:
git clone <your-repo-url> cd aircheck_model pip install -e '.[dev]'
🐍 Python API Usage
After installation, you can import the top-level functions train and screen:
--- Train models ---
from aircheck_model import train, screen
train_result, test_result = train(
train_file="location of parquet file",
train_column="ECFP6",
label="LABEL",
model_dir="aircheck_model/new_model",
# test_file is optional (default=None)
)
Accepts training and test datasets in Parquet format. Please provide the file path. Datasets can be downloaded from our website AIRCHECK
print(result_df.head())
The train function returns two outputs: train_result and test_result.
train_result: a DataFrame object containing model metrics and fold information.test_result: a DataFrame object if atest_fileis provided; otherwise, an empty dataframe .
--- Screen compounds ---
result_df = screen(
screen_file="data/ScreenData1.csv",
smile_column="SMILES",
fingerprint_type="ECFP6",
model_directory="aircheck_model/new_model"
)
💻 CLI Usage
The package also provides a command-line tool:
aircheck_model --help
🔹 Check Version
aircheck_model version
🔹 Train Models
aircheck_model train \ --train-data data/WDR91.parquet \ --column ECFP6 \ --label LABEL \ --model-dir aircheck_model/new_model \ --test-data data/sampled_data_test_1.parquet
Arguments:
-
--train-data, -t(required): Path to training data (CSV/Parquet) -
--test-data, -e: Optional path to test data -
--column, -c(required): Feature column (e.g., fingerprint type such as ECFP4, ECFP6) -
--label, -l(required): Label column name -
--model-dir, -m: Directory to save trained models (default:~/model)
🔹 Screen Compounds
aircheck_model screen \ --screen-data data/ScreenData1.csv \ --column SMILES \ --fingerprints-column ECFP6 \ --model-dir aircheck_model/new_model
Arguments:
-
--screen-data, -s(required): Path to compound data file -
--column, -c(required): Column containing SMILES strings -
--fingerprints-column, -l(required): Fingerprint column name -
--model-dir, -m: Directory where trained models are stored
🛠 Development
Run tests and linting locally:
pytest ruff check .
aircheck_model
aircheck_model is a Python package for training and screening machine learning models on chemical compound datasets.
It provides a Python API (simple train and screen functions) and a Command-Line Interface (CLI) for easy integration in pipelines or local workflows.
The package is designed to work with molecular fingerprints (e.g., ECFP) and chemical structure data in formats such as CSV or Parquet.
✨ Features Train ML models with training and optional test datasets Save trained models to a specified directory Evaluate models on test datasets Screen new compounds using trained models Simple CLI powered by Typer 📦 Installation Install from PyPI (once published):
pip install aircheck-model Or install locally for development:
git clone cd aircheck_model pip install -e '.[dev]'
🐍 Python API Usage After installation, you can import the top-level functions train and screen:
from pathlib import Path from aircheck_model import train, screen
— Train models — train_file="location of parquet file", train_column="ECFP6", label="LABEL", model_dir="aircheck_model/new_model", # test_file is optional (default=None) ) Accepts training and test datasets in Parquet format. Please provide the file path. Datasets can be downloaded from our website AIRCHECK
— Screen compounds — result_df = screen( screen_file=“data/ScreenData1.csv”, smile_column=“SMILES”, fingerprint_type=“ECFP6”, model_directory=“aircheck_model/new_model” )
print(result_df.head())
💻 CLI Usage The package also provides a command-line tool:
aircheck_model --help
🔹 Check Version aircheck_model version
🔹 Train Models aircheck_model train \ --train-data data/WDR91.parquet \ --column ECFP6 \ --label LABEL \ --model-dir aircheck_model/new_model \ --test-data data/sampled_data_test_1.parquet
Arguments:
--train-data, -t (required): Path to training data (CSV/Parquet)
--test-data, -e: Optional path to test data
--column, -c (required): Feature column (e.g., fingerprint type such as ECFP4, ECFP6)
--label, -l (required): Label column name
--model-dir, -m: Directory to save trained models (default: ~/model)
🔹 Screen Compounds aircheck_model screen \ --screen-data data/ScreenData1.csv \ --column SMILES \ --fingerprints-column ECFP6 \ --model-dir aircheck_model/new_model
Arguments:
--screen-data, -s (required): Path to compound data file
--column, -c (required): Column containing SMILES strings
--fingerprints-column, -l (required): Fingerprint column name
--model-dir, -m: Directory where trained models are stored
🛠 Development Run tests and linting locally:
pytest ruff check .
Markdown 3105 bytes 370 words 122 lines Ln 47, Col 3HTML 2259 characters 333 words 58 paragraphs
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aircheck_test_model-1.1.3.tar.gz.
File metadata
- Download URL: aircheck_test_model-1.1.3.tar.gz
- Upload date:
- Size: 331.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e824996232e493f62805aa1044293e61b6adf1c000b26a4315aa659b8101e4f7
|
|
| MD5 |
6072e8b78b759e76cc3d1a645d39fa28
|
|
| BLAKE2b-256 |
a2431e9ce3bda3578f27c0f3b0db381373385e7bbb6fe699873b4af2bcf9a540
|
Provenance
The following attestation bundles were made for aircheck_test_model-1.1.3.tar.gz:
Publisher:
pypi.yaml on nabinelnino/aircheck-model
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aircheck_test_model-1.1.3.tar.gz -
Subject digest:
e824996232e493f62805aa1044293e61b6adf1c000b26a4315aa659b8101e4f7 - Sigstore transparency entry: 552339818
- Sigstore integration time:
-
Permalink:
nabinelnino/aircheck-model@143f0d0f038ffca082d0d1e0a4408780895f8f46 -
Branch / Tag:
refs/tags/v1.1.3 - Owner: https://github.com/nabinelnino
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yaml@143f0d0f038ffca082d0d1e0a4408780895f8f46 -
Trigger Event:
push
-
Statement type:
File details
Details for the file aircheck_test_model-1.1.3-py3-none-any.whl.
File metadata
- Download URL: aircheck_test_model-1.1.3-py3-none-any.whl
- Upload date:
- Size: 333.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f43d65fb703eb7bfaa58b363197df2d24658dfe4eda93fdab688a6ae37abd7b
|
|
| MD5 |
0eaec7c13b92b12ef7b38f9f38998aa4
|
|
| BLAKE2b-256 |
54833671ea2d387ff10233a3b8b3af6929ec100e38d11cb015b654bccaff3cf3
|
Provenance
The following attestation bundles were made for aircheck_test_model-1.1.3-py3-none-any.whl:
Publisher:
pypi.yaml on nabinelnino/aircheck-model
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aircheck_test_model-1.1.3-py3-none-any.whl -
Subject digest:
7f43d65fb703eb7bfaa58b363197df2d24658dfe4eda93fdab688a6ae37abd7b - Sigstore transparency entry: 552339837
- Sigstore integration time:
-
Permalink:
nabinelnino/aircheck-model@143f0d0f038ffca082d0d1e0a4408780895f8f46 -
Branch / Tag:
refs/tags/v1.1.3 - Owner: https://github.com/nabinelnino
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yaml@143f0d0f038ffca082d0d1e0a4408780895f8f46 -
Trigger Event:
push
-
Statement type: