An active learning package for experimental design in chemistry and materials science.

These details have not been verified by PyPI

Project links

Homepage

Project description

ActiveSampler: An Active Learning Package for Experimental Design in Chemistry and Materials Science

ActiveSampler is a Python package designed to facilitate active learning workflows specifically tailored for experimental design in chemistry and materials science. By intelligently selecting the most informative data points for labeling, ActiveSampler aims to optimize experiments, reduce costs, and accelerate discovery in these fields.

Features

Model Training and Prediction: Supports both classification and regression tasks using models like Logistic Regression, Random Forest, and XGBoost.
Uncertainty Calculation: Computes uncertainty for classification using entropy and for regression using variance.
Objective Function Evaluation: Allows custom objective functions to guide the selection of samples.
Diversity and Acquisition: Incorporates diversity measures and acquisition functions to balance exploration and exploitation.
Grid Sampling and Constraints: Generates sampling grids and applies constraints to ensure valid experimental designs.
Active Learning Selection: Selects the most informative samples to enhance model performance with customizable weights for objective, uncertainty, and diversity.

Installation

To install ActiveSampler, clone the repository and install the dependencies:

git clone https://github.com/yourusername/active_sampler.git  # Replace with your repository URL
cd active_sampler
pip install -r requirements.txt

Usage

Example

This is an example input data to select new data points in a LARP synthesis, full data on examples/example1_LARP/input.csv:

ligand_quantity,ligand_ii_quantity,halogen_alloy_quantity,antisolvent_quantity,structural_response
10.0,300,0,3000,1
5.0,300,0,3000,1
...

Here is the code to sample these new points:

from active_sampler import active_sampling, load_and_preprocess_data

# Define the path to your data file
filepath = 'input.csv'

# Specify target columns and their types
target_columns = ['structural_response']
target_types = {
    'structural_response': 'classification',
}
num_classes_dict = {
    'structural_response': 3
}

# Define the objective function as a string
obj_fn_str = 'structural_response_class_2'

# Load and preprocess data
X, y_dict = load_and_preprocess_data(
    filepath,
    target_columns,
    target_types,
)

# Start active learning selection
active_sampling(
    X,
    y_dict,
    target_types,
    obj_fn_str,
    num_classes_dict=num_classes_dict,
    num_sampling=25,
    alpha=0.25,  # Objective weight
    beta=0.25,  # Uncertainty weight
    gamma=0.5,  # Diversity weight
    sufix='LARP',
)

Input Data Format

The input data should be in CSV format:

ligand_quantity,ligand_ii_quantity,halogen_alloy_quantity,antisolvent_quantity,structural_response
10.0,300,0,3000,1
5.0,300,0,3000,1
...

`load_and_preprocess_data` Function

The load_and_preprocess_data function loads, cleans, and prepares your data. It handles renaming, missing values, removing rows/columns, and splitting data into features (X) and targets (y_dict). See the examples for detailed usage.

Parameters: filepath, target_columns, target_types, column_mapping (optional), categorical_cols (optional), missing_value_strategy (optional), imputation_values (optional), rows_to_remove (optional), columns_to_remove (optional), regex_columns_to_remove (optional).

`active_sampling` Function Parameters

X: Feature DataFrame.
y_dict: Dictionary mapping target names to their Series.
target_types: Dictionary mapping target names to 'classification' or 'regression'.
obj_fn_str: String defining the objective function. References:
- Classification: target_class_i (e.g., 'structure_type_class_2').
- Regression: target (e.g., 'contact_angle').
- Normalized Regression: norm_target (e.g., norm_contact_angle).
sufix: Suffix for output files.
categorical_cols: List of categorical columns.
num_classes_dict: Dictionary mapping classification targets to number of classes.
initial_train_size: Initial training set size (or None for all data).
num_sampling: Number of samples to select.
alpha, beta, gamma: Weights for objective, uncertainty, and diversity.
user_num_grid_points: Custom grid points per numerical variable (int, 'unique', or dict).
variable_constraints: Constraints to filter the sampling grid (list of dicts). Each dict has conditions, assignments, and optional mutual_constraint.
unc_fn_str: Custom formula for combining uncertainties. References: target_unc, norm_target_unc.
diversity_settings: Settings for diversity: neighbor_distance_metric (default: 'euclidean'), same_cluster_penalty (default: 0.5), number_of_clusters (default: 'num_sampling').

Output

The active_sampling function generates a .txt file and a .csv file containing the coordinates of the selected samples, sorted by all columns. See the examples folder for detailed output formats.

Examples

The package includes several examples demonstrating different use cases, located in the examples folder. The structure is as follows:

├── README.md
├── active_sampler
│   ├── __init__.py
│   ├── core.py
│   └── utils.py
├── examples
│   ├── example1_LARP
│   │   ├── example1.py
│   │   ├── input.csv
│   │   ├── selected_samples_LARP.csv
│   │   └── selected_samples_LARP.txt
│   ├── example2_PhobicSurfaces
│   │   ├── example2.py
│   │   ├── input.csv
│   │   ├── selected_samples_PhobicSurfaces.csv
│   │   └── selected_samples_PhobicSurfaces.txt
│   ├── example3_BatteryOptimization
│   │   ├── example3.py
│   │   ├── input.csv
│   │   ├── selected_samples_BatteryOptimization.csv
│   │   └── selected_samples_BatteryOptimization.txt
│   └── example4_ProcessingAndConstraints
│       ├── example4.py
│       ├── input.csv
│       ├── selected_samples_LARP_advanced_features.csv
│       └── selected_samples_LARP_advanced_features.txt

Each example folder contains:

example[N].py: The Python script implementing the active learning workflow.
input.csv: The input data used for the example.

Pre-generated output files are provided for each example:

selected_samples_[sufix].csv: The CSV file with the selected samples.
selected_samples_[sufix].txt: The text file with the selected samples and run information.

Here's a breakdown of each example:

example1_LARP: A basic example focused on optimizing a LARP (Ligand-Assisted Reprecipitation) synthesis. It uses a single classification target (structural_response) to predict the structural outcome of the synthesis.
example2_PhobicSurfaces: This example deals with predicting the contact angle of surfaces, a regression problem. It also demonstrates the use of categorical features (metal_precursor, surface_coating_material).
example3_BatteryOptimization: A more complex, multi-output example focused on battery material optimization. It involves multiple regression targets (specific capacity, capacity retention, etc.) and uses custom objective and uncertainty functions to guide the selection process. It also uses categorical features.
example4_ProcessingAndConstraints: This example showcases advanced features like custom grid points (restricting the sampling space for certain variables), variable constraints (ensuring logical relationships between variables), and more detailed data preprocessing options. It uses a combination of classification and regression targets.

Run them directly (e.g., python example1_LARP/example1.py) after ensuring the active_sampler package is installed and the input.csv files are present.

Contributing

Contributions are welcome! Please submit a Pull Request.

License

This project is licensed under the MIT License.

Contact

For questions or issues, please contact [rogeriog.em@gmail.com].

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0

Feb 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

active_sampler-0.1.0.tar.gz (21.6 kB view details)

Uploaded Feb 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

active_sampler-0.1.0-py3-none-any.whl (18.5 kB view details)

Uploaded Feb 8, 2025 Python 3

File details

Details for the file active_sampler-0.1.0.tar.gz.

File metadata

Download URL: active_sampler-0.1.0.tar.gz
Upload date: Feb 8, 2025
Size: 21.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for active_sampler-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`730a5e62f274c3a53c38c979930b882bdbfb50c7e400f89d5925bbf4d0e0878e`
MD5	`a20679531eb278c0d7cc0a3cafd49fb1`
BLAKE2b-256	`58af91e81cb178150fad3869d6457da0bc14d508a00a7b96be3582090d9595f2`

See more details on using hashes here.

File details

Details for the file active_sampler-0.1.0-py3-none-any.whl.

File metadata

Download URL: active_sampler-0.1.0-py3-none-any.whl
Upload date: Feb 8, 2025
Size: 18.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for active_sampler-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c3e1f0d485bf686d928bccadb32d619cd88353aa67c9637012e03500cbb3d67e`
MD5	`65929e9da2f1b74be7a606f7a26be020`
BLAKE2b-256	`9d4fbdc34073ab38732380862c25fea7285afc0c642b8cc4bcf22562a4c2f77f`

See more details on using hashes here.

active-sampler 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ActiveSampler: An Active Learning Package for Experimental Design in Chemistry and Materials Science

Features

Installation

Usage

Example

Input Data Format

`load_and_preprocess_data` Function

`active_sampling` Function Parameters

Output

Examples

Contributing

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

active-sampler 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ActiveSampler: An Active Learning Package for Experimental Design in Chemistry and Materials Science

Features

Installation

Usage

Example

Input Data Format

load_and_preprocess_data Function

active_sampling Function Parameters

Output

Examples

Contributing

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`load_and_preprocess_data` Function

`active_sampling` Function Parameters