This package aims to offer helper functions that simplify model building and evaluation

Project description

aiqclib

aiqclib is a Python library that provides a configuration-driven workflow for machine learning, simplifying dataset preparation, model training, and data classification. It is a core component of the AIQC project that aims to enhance anomaly detection in CTD (Conductivity, Temperature, Depth) data.

ML Algorithms Supported by aiqclib

Category	Algorithm	Short Name	Method
Tree-Based & Ensemble	XGBoost	XGB	Ensemble (Boosting)
	Random Forest	RF	Ensemble (Bagging)
	Decision Tree	DT	Tree
Linear & Geometric	Logistic Regression	Logit	Linear
	Linear Discriminant Analysis	LDA	Linear / Statistical
	Support Vector Machine	SVM	Geometric
Instance-Based (Distance-Based)	K-Nearest Neighbors	KNN	Distance-based
Probabilistic	Gaussian Naive Bayes	GNB	Probabilistic
Neural Network	Multilayer Perceptron	MLP	Neural Network

Installation

The package is available on PyPI and conda-forge.

Using pip:

pip install aiqclib

Using conda:

conda install -c conda-forge aiqclib

Documentation

Project documentation is hosted on Read the Docs.

Core Concepts

The library is designed around a three-stage workflow:

Dataset Preparation: Prepare feature datasets from raw data and generate training, validation, and test data sets.
Training & Evaluation: Train machine learning models and evaluate their performance using cross-validation.
Classification: Apply a trained model to classify new, unseen data.

Each stage is controlled by a YAML configuration file, allowing you to define and reproduce your entire workflow with ease.

Usage

The general workflow for any task in aiqclib follows these steps:

Generate a Configuration Template: Create a starter YAML file for the task (e.g., prepare, train, classify).
Customize the Configuration: Edit the YAML file to specify paths, dataset names, and other parameters.
Run the Task: Load the configuration and execute the main function for the task.

1. Dataset Preparation

This workflow processes your input data and creates training, validation, and test sets.

Step 1: Generate a configuration template.

import aiqclib as aq

aq.write_config_template(file_name="/path/to/prepare_config.yaml", stage="prepare")

Step 2: Customize prepare_config.yaml. You must edit the file to set the correct input/output paths and define your dataset. See the Configuration section for details.

Step 3: Run the preparation process.

import aiqclib as aq

config = aq.read_config("/path/to/prepare_config.yaml")
aq.create_training_dataset(config)

This generates the following output folders:

summary: Statistics of input data used for normalization.
select: Profiles with bad observation flags (positive samples) and good profiles (negative samples).
locate: Observation records for both positive and negative profiles.
extract: Features extracted from the observation records.
training: The final training, validation, and test datasets.

2. Model Training and Evaluation

This workflow uses the prepared dataset to train a model and evaluate its performance.

Step 1: Generate a training configuration template.

import aiqclib as aq

aq.write_config_template(file_name="/path/to/training_config.yaml", stage="train")

Step 2: Customize training_config.yaml. Edit the file to point to your prepared dataset and define training parameters.

Step 3: Train and evaluate the model.

import aiqclib as aq

config = aq.read_config("/path/to/training_config.yaml")
aq.train_and_evaluate(config)

This generates the following output folders:

validate: Results from the cross-validation process.
build: The final trained models and their evaluation results on the test dataset.

3. Data Classification

This workflow applies a trained model to classify all observations in a dataset.

Step 1: Generate a classification configuration template.

import aiqclib as aq

aq.write_config_template(file_name="/path/to/classification_config.yaml", stage="classify")

Step 2: Customize classification_config.yaml. Edit the file to point to the input data and the trained model.

Step 3: Run classification.

import aiqclib as aq

config = aq.read_config("/path/to/classification_config.yaml")
aq.classify_dataset(config)

This workflow processes a dataset using a trained model and generates:

classify: The final classification results and a summary report.

Configuration

Configuration is managed via YAML files. The write_config_template function provides a starting point that you must customize for each module.

1. Dataset Preparation (`stage="prepare"`)

The preparation config requires you to modify two key sections:

path_info_sets: Defines the location of input and output data.

path_info_sets:
  - name: data_set_1
    common:
      base_path: /path/to/data # EDIT: Root output directory
    input:
      base_path: /path/to/input # EDIT: Directory with input files
      step_folder_name: ""
    split:
      step_folder_name: training

data_sets: Defines a specific dataset to be processed.

data_sets:
  - name: dataset_0001  # EDIT: Your data set name
    dataset_folder_name: dataset_0001  # EDIT: Your output folder
    input_file_name: nrt_cora_bo_4.parquet # EDIT: Your input filename

2. Training and Evaluation (`stage="train"`)

The training config links the prepared data to the model training process.

path_info_sets: Defines where to find the prepared dataset and where to save model artifacts.

path_info_sets:
  - name: data_set_1
    common:
      base_path: /path/to/data # EDIT: Root output directory
    input:
      step_folder_name: training

training_sets: Links to a dataset prepared in the previous workflow.

training_sets:
  - name: training_0001  # EDIT: Your training name
    dataset_folder_name: dataset_0001  # EDIT: Your output folder

3. Classification (`stage="classify"`)

The classification config uses a trained model to classify new data.

path_info_sets: Defines paths for raw data, models, and classification results.

path_info_sets:
  - name: data_set_1
    common:
      base_path: /path/to/data # EDIT: Root output directory
    input:
      base_path: /path/to/input # EDIT: Directory with input files
      step_folder_name: ""
    model:
      base_path: /path/to/model  # EDIT: Directory with model files
      step_folder_name: model
    concat:
      step_folder_name: classification # EDIT: Directory with classification results

classification_sets: Defines a specific dataset to be classified.

classification_sets:
  - name: classification_0001  # EDIT: Your classification name
    dataset_folder_name: dataset_0001  # EDIT: Your output folder
    input_file_name: nrt_cora_bo_4.parquet   # EDIT: Your input filename

Contributing & Development

We welcome contributions! Please use the following guidelines for development.

Environment Setup

We recommend using uv for managing the development environment.

Install uv. We recommend installing uv into your base conda/mamba environment so the uv command is available globally without cluttering base. If you don't use conda/mamba, you can install it with pip instead.

    # Using mamba (recommended)
    mamba activate base
    mamba install -n base -c conda-forge uv

    # Or using conda
    conda activate base
    conda install -n base -c conda-forge uv

    # Or using pip
    pip install uv

Alternatively, the [standalone installer](https://docs.astral.sh/uv/getting-started/installation/) from Astral works on any platform without needing Python or conda preinstalled.

Create and activate the project's virtual environment. From the project's root directory, run the following:

    # Create the virtual environment in a .venv folder
    uv venv

    # Activate the virtual environment
    source .venv/bin/activate

Install the project and its dependencies. This command installs the library in "editable" mode (-e) and pulls in all dependencies from pyproject.toml.

    uv sync
    uv pip install -e .

Download the test data. The test fixtures (~15 MB of parquet, joblib, and YAML files) are not stored in the repository. They live as a GitHub release asset and need to be downloaded once before tests can run:

    bash scripts/fetch_test_data.sh

This places the fixtures under `tests/data/`. The script requires the [`gh` CLI](https://cli.github.com) (authenticated via `gh auth login`) and `unzip`. To pin a specific data version or pull from a fork, override the defaults via environment variables:

    TEST_DATA_VERSION=test-data-v1.0.1 bash scripts/fetch_test_data.sh

You only need to re-run this when the test data version changes.

Running Tests

With your environment activated and test data downloaded, you can run the test suite using pytest.

uv run pytest -v

Code Style (Linting & Formatting)

We use Ruff for linting and formatting.

Linting: Check the library and test code for style issues.

# Lint the library source code
uv run ruff check src

# Lint the test code
uv run ruff check tests

Formatting: Automatically format the code to match the project's style.

# Format the library source code
uv run ruff format src

# Format the test code
uv run ruff format tests

Documentation (for Maintainers)

Building Docs Locally

Update Docstrings (Requires Google Gemini API Key):

# Update docstrings for source files
python ./docs/scripts/update_docstrings.py src docs/scripts/prompt_main.txt

# Update docstrings for test files
python ./docs/scripts/update_docstrings.py tests docs/scripts/prompt_unittest.txt

Review Docstrings: Manually review all modified files. Remove generated headers/footers and correct any sections marked with "Issues:".

Update API Documents: From the project root, run:

uv run sphinx-apidoc -f --remove-old --module-first -o docs/source/api src/aiqclib

Build HTML: From the project root, run:
```
cd docs; uv run make html; cd ..
```
You can view the generated site by opening docs/build/html/index.html in a browser.

Deployment (for Maintainers)

PyPI

The package is published to PyPI automatically via a GitHub Action whenever a new release is created on GitHub.

conda-forge (Automatic)

The conda-forge bot automatically creates a pull request and merges it into the main branch when a new version of the package is published on PyPI.

conda-forge (Manual)

Bump version with new dependencies

When runtime dependencies change, the automated PR from the conda-forge bot may fail. In that case, you must manually update the feedstock by creating a pull request to the conda-forge/aiqclib-feedstock repository in this case.

Install build tools:

mamba install -c conda-forge conda-build conda-smithy grayskull

Fork and clone the aiqclib-feedstock repository.
Sync with upstream (e.g., add conda-forge/aiqclib-feedstock as a remote named upstream and git rebase upstream/main).

Update the forked repo:

git checkout main                      # Go to your local main branch
git fetch upstream                     # Get latest changes from original repo
git rebase upstream/main               # Make your local main perfectly linear with original
git push origin main --force           # Update your GitHub fork's main (optional but good practice)

Create a new branch (e.g., git checkout -b update_vX.Y.Z).
Generate a strict recipe (e.g., grayskull pypi aiqclib --strict-conda-forge).
Review recipes/meta.yaml and ensure it meets conda-forge standards.
Rerender the feedstock (e.g., conda smithy rerender -c auto).
Commit, push, and open a pull request to the staged-recipes repository.
Merge it after passing CI.

Initial upload

Submitting the package on conda-forge involves creating a pull request to the conda-forge/staged-recipes repository.

Fork and clone the staged-recipes repository.
Configure upstream the git remote add upstream https://github.com/conda-forge/aiqclib-feedstock.git
Create a new branch (e.g., git checkout -b aiqclib-recipe).
Generate a strict recipe: grayskull pypi aiqclib --strict-conda-forge.
Review recipes/aiqclib/meta.yaml and ensure it meets conda-forge standards.
Commit, push, and open a pull request to the staged-recipes repository.

Anaconda.org (Manual)

Publishing to the <username> channel on Anaconda.org is a manual process.

Install build tools:

mamba install -c conda-forge conda-build anaconda-client grayskull

Generate Recipe: From the project root, run grayskull pypi aiqclib. This creates aiqclib/meta.yaml.
Build Package: conda build aiqclib

Upload Package:

anaconda login
anaconda upload /path/to/your/conda-bld/noarch/aiqclib-*.conda

Cleanup: Copy aiqclib/meta.yaml to conda/meta.yaml for version control and remove the temporary aiqclib directory.

Project details

Release history Release notifications | RSS feed

This version

0.2.0

May 11, 2026

0.1.2

May 8, 2026

0.1.0

May 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiqclib-0.2.0.tar.gz (287.9 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aiqclib-0.2.0-py3-none-any.whl (152.4 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file aiqclib-0.2.0.tar.gz.

File metadata

Download URL: aiqclib-0.2.0.tar.gz
Upload date: May 11, 2026
Size: 287.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aiqclib-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`240b7589bdc09aaf5fc6016f31d7b57329c5cba3e8fac3df6f1475cf25674581`
MD5	`856b7a260166c7787ccdd6cccdc58c11`
BLAKE2b-256	`010e231ce915949326c4d333285f6ab85230821f2561a649adc26862460a3adc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aiqclib-0.2.0.tar.gz:

Publisher: publish_to_pypi.yml on AIQC-Hub/aiqclib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aiqclib-0.2.0.tar.gz
- Subject digest: 240b7589bdc09aaf5fc6016f31d7b57329c5cba3e8fac3df6f1475cf25674581
- Sigstore transparency entry: 1510026270
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: AIQC-Hub/aiqclib@1cd3e4d0d49f162b7e5a1ac683a8bef80d6ed200
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/AIQC-Hub
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish_to_pypi.yml@1cd3e4d0d49f162b7e5a1ac683a8bef80d6ed200
- Trigger Event: release

File details

Details for the file aiqclib-0.2.0-py3-none-any.whl.

File metadata

Download URL: aiqclib-0.2.0-py3-none-any.whl
Upload date: May 11, 2026
Size: 152.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aiqclib-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`687fa7b388648a2a2c996eb5d4ea6075cc03dba40d4f98d999ba32394bcc972d`
MD5	`80f956192835a6decc98519677197e7c`
BLAKE2b-256	`aef5e046c9d6a25d17a00ccc647b81875b4e15aa789cb7faaa5833640c9127f3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aiqclib-0.2.0-py3-none-any.whl:

Publisher: publish_to_pypi.yml on AIQC-Hub/aiqclib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aiqclib-0.2.0-py3-none-any.whl
- Subject digest: 687fa7b388648a2a2c996eb5d4ea6075cc03dba40d4f98d999ba32394bcc972d
- Sigstore transparency entry: 1510026327
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: AIQC-Hub/aiqclib@1cd3e4d0d49f162b7e5a1ac683a8bef80d6ed200
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/AIQC-Hub
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish_to_pypi.yml@1cd3e4d0d49f162b7e5a1ac683a8bef80d6ed200
- Trigger Event: release

aiqclib 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

aiqclib

ML Algorithms Supported by aiqclib

Installation

Documentation

Core Concepts

Usage

1. Dataset Preparation

2. Model Training and Evaluation

3. Data Classification

Configuration

1. Dataset Preparation (stage="prepare")

2. Training and Evaluation (stage="train")

3. Classification (stage="classify")

Contributing & Development

Environment Setup

Running Tests

Code Style (Linting & Formatting)

Documentation (for Maintainers)

Building Docs Locally

Deployment (for Maintainers)

PyPI

conda-forge (Automatic)

conda-forge (Manual)

Bump version with new dependencies

Initial upload

Anaconda.org (Manual)

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

1. Dataset Preparation (`stage="prepare"`)

2. Training and Evaluation (`stage="train"`)

3. Classification (`stage="classify"`)