Skip to main content

Data Curation in Polaris

Project description

Auroris

PyPI Conda PyPI - Downloads Conda PyPI - Python Version

test release code-check doc

Auroris is a Python library designed to assist researchers and scientists in managing, cleaning, and preparing data relevant to drug discovery. Auroris will implement a range of techniques to handle, transform, filter, analyze, or visualize the diverse data types commonly encountered in drug discovery.

Currently, Auroris supports curation for small molecules, with plans to extend to other modalities in drug discovery. The curation module for small molecules includes:

  • 🗄️ Molecule Standardization: Ensures that each molecule is represented in a uniform and unambiguous form.

  • 🏷️ Detection of Duplicate Molecules with Contradictory Labels: Identifies and resolves inconsistencies in activity data for each molecule.

  • ⛰️ Detection of Activity Cliffs Between Stereoisomers: Identifies significant differences in activity between stereoisomers.

  • 🔍Outlier Detection and Visualization: Detects and visualizes outliers in molecular activity data.

  • 📽️ Visualization of Molecular Distribution in Chemical Space: Provides graphical representations of molecular distributions.

Reproducibility and transparency are core to the mission of Polaris. That’s why with Auroris, you can also automatically generate detailed reports summarizing the changes that happened to a dataset during curation. Through an intuitive API, you can easily define complex curation workflows. Once defined, that workflow is serializable and thus reproducible so you can transparently share how you curated the dataset.

Getting started

from auroris.curation import Curator
from auroris.curation.actions import MoleculeCuration, OutlierDetection, Discretization

# Define the curation workflow
curator = Curator(
    steps=[
        MoleculeCuration(input_column="smiles"),
        OutlierDetection(method="zscore", columns=["SOL"]),
        Discretization(input_column="SOL", thresholds=[-3]),
    ],
    parallelized_kwargs = { "n_jobs": -1 }
)

# Run the curation
dataset, report = curator(dataset)

Run curation with command line

A Curator object is serializable, so you can save it to and load it from a JSON file to reproduce the curation.

auroris [config_file] [destination] --dataset-path [data_path]

Documentation

Please refer to the documentation, which contains tutorials for getting started with auroris and detailed descriptions of the functions provided.

Installation

You can install auroris using conda/mamba/micromamba:

conda install -c conda-forge auroris

You can also use pip:

pip install auroris

Development lifecycle

Setup dev environment

conda env create -n auroris -f env.yml
conda activate auroris

pip install --no-deps -e .
Other installation options
Alternatively, using [uv](https://github.com/astral-sh/uv):
```shell
uv venv -p 3.12 auroris
source .venv/auroris/bin/activate
uv pip compile pyproject.toml -o requirements.txt --all-extras
uv pip install -r requirements.txt 
```   

Tests

You can run tests locally with:

pytest

License

Under the Apache-2.0 license. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auroris-0.1.7.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

auroris-0.1.7-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file auroris-0.1.7.tar.gz.

File metadata

  • Download URL: auroris-0.1.7.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for auroris-0.1.7.tar.gz
Algorithm Hash digest
SHA256 817e6c87dbe66c4d1fc8ac73eadb535cf3175154edcaccf2fc1517652d336e37
MD5 d603684736d0c084dbf46d6eeecc42d7
BLAKE2b-256 ec9498defe6a77928f371aa01dc6d5c40a961f411342c8fefad8aa2d602dbcb2

See more details on using hashes here.

File details

Details for the file auroris-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: auroris-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for auroris-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 5aaa2e60351f1de0076f4ddc3e04ba588b4d73282e9ee90a48e3c1f1e6d9590d
MD5 30fa9bb033d93241f2c20969f49e845d
BLAKE2b-256 c0fd28721f8d0543e2d23d4d582deac8b8356f4b24619d3b8f040d11369793f5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page