Data Curation in Polaris
Project description
Auroris
Auroris is a Python library designed to assist researchers and scientists in managing, cleaning, and preparing data relevant to drug discovery. Auroris will implement a range of techniques to handle, transform, filter, analyze, or visualize the diverse data types commonly encountered in drug discovery.
Currently, Auroris supports curation for small molecules, with plans to extend to other modalities in drug discovery. The curation module for small molecules includes:
-
🗄️ Molecule Standardization: Ensures that each molecule is represented in a uniform and unambiguous form.
-
🏷️ Detection of Duplicate Molecules with Contradictory Labels: Identifies and resolves inconsistencies in activity data for each molecule.
-
⛰️ Detection of Activity Cliffs Between Stereoisomers: Identifies significant differences in activity between stereoisomers.
-
🔍Outlier Detection and Visualization: Detects and visualizes outliers in molecular activity data.
-
📽️ Visualization of Molecular Distribution in Chemical Space: Provides graphical representations of molecular distributions.
Reproducibility and transparency are core to the mission of Polaris. That’s why with Auroris, you can also automatically generate detailed reports summarizing the changes that happened to a dataset during curation. Through an intuitive API, you can easily define complex curation workflows. Once defined, that workflow is serializable and thus reproducible so you can transparently share how you curated the dataset.
Getting started
from auroris.curation import Curator
from auroris.curation.actions import MoleculeCuration, OutlierDetection, Discretization
# Define the curation workflow
curator = Curator(
steps=[
MoleculeCuration(input_column="smiles"),
OutlierDetection(method="zscore", columns=["SOL"]),
Discretization(input_column="SOL", thresholds=[-3]),
],
parallelized_kwargs = { "n_jobs": -1 }
)
# Run the curation
dataset, report = curator(dataset)
Run curation with command line
A Curator
object is serializable, so you can save it to and load it from a JSON file to reproduce the curation.
auroris [config_file] [destination] --dataset-path [data_path]
Documentation
Please refer to the documentation, which contains tutorials for getting started with auroris
and detailed descriptions of the functions provided.
Installation
You can install auroris
using conda/mamba/micromamba:
conda install -c conda-forge auroris
You can also use pip:
pip install auroris
Development lifecycle
Setup dev environment
conda env create -n auroris -f env.yml
conda activate auroris
pip install --no-deps -e .
Other installation options
Alternatively, using [uv](https://github.com/astral-sh/uv):
```shell
uv venv -p 3.12 auroris
source .venv/auroris/bin/activate
uv pip compile pyproject.toml -o requirements.txt --all-extras
uv pip install -r requirements.txt
```
Tests
You can run tests locally with:
pytest
License
Under the Apache-2.0 license. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file auroris-0.1.7.tar.gz
.
File metadata
- Download URL: auroris-0.1.7.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 817e6c87dbe66c4d1fc8ac73eadb535cf3175154edcaccf2fc1517652d336e37 |
|
MD5 | d603684736d0c084dbf46d6eeecc42d7 |
|
BLAKE2b-256 | ec9498defe6a77928f371aa01dc6d5c40a961f411342c8fefad8aa2d602dbcb2 |
File details
Details for the file auroris-0.1.7-py3-none-any.whl
.
File metadata
- Download URL: auroris-0.1.7-py3-none-any.whl
- Upload date:
- Size: 35.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5aaa2e60351f1de0076f4ddc3e04ba588b4d73282e9ee90a48e3c1f1e6d9590d |
|
MD5 | 30fa9bb033d93241f2c20969f49e845d |
|
BLAKE2b-256 | c0fd28721f8d0543e2d23d4d582deac8b8356f4b24619d3b8f040d11369793f5 |