Skip to main content

A python package for scaling and automating pre-processing, visualization, classification, and features selection of generic data sets.

Project description

orthrus

A python package for scaling and automating pre-processing, visualization, classification, and features selection of generic data sets. Read the docs!

Installing the conda environment

In order to ensure proper behavior of python classes and functions between platforms we recommend installing an isolated conda environment with the depedencies listed in environment.yml. To create a new enviroment with these dependencies, from the shell run:

conda env create -f environment.yml

This will generate the conda environment orthrus and install any dependencies required by the orthrus module. If the user does not have a CUDA >=11 compatible graphics card, then the user can replace environment.yml with environment_nocuda.yml. The user can also use their own environment and install the packages listed in either environment.yml or environment_nocuda.yml.

Installing the orthrus package

orthrus is now available through the PyPi just run

pip install orthrus

to install the orthrus package. To install the orthrus package from this repo, first activate the orthrus environment and then navigate to your local orthrus directory:

conda activate orthrus
cd /path/to/orthrus/

Install the package with pip

pip install -e .

Finally add ORTHRUS_PATH=/path/to/orthrus/ to your environment variables (different for each OS).

Basic Usage

The fundamental object in the orthrus package is the DataSet class. Here is an example of loading the iris dataset into the DataSet class to create an instance from within the orthrus directory:

# imports
from orthrus.core.dataset import DataSet as DS
import pandas as pd

# load data and metadata
data = pd.read_csv("test_data/Iris/Data/iris_data.csv", index_col=0)
metadata = pd.read_csv("test_data/Iris/Data/iris_metadata.csv", index_col=0)

# create DataSet instance
ds = DS(name='iris', path='./test_data', data=data, metadata=metadata)

# save dataset
ds.save()

here path indicates where ds will save figures and results output by the class methods.

Creating a Project Environment

To increase organization and reproducibility of results the orthrus package includes helper functions for generating a project directory and experiment subdirectories. Here is an example where we create a project directory called Iris and then generate an experiment directory called setosa_versicolor_classify_species_svm where we intend to classify setosa and versicolor species with an SVM classifier.

# imports
from orthrus.core.helper import generate_project
from orthrus.core.helper import generate_experiment
from orthrus.core.dataset import load_dataset
import shutil

# Create a project directory structure in the test path
file_path = './test_data/'
generate_project('Iris', file_path)

# move data into Data directory of Iris project directory
shutil.move('./test_data/iris.ds', './test_data/Iris/Data/iris.ds')

# create experiment directory in the Experiments directory of the Iris directory
proj_dir = './test_data/Iris/'
generate_experiment('setosa_versicolor_classify_species_svm', proj_dir)

Once the setosa_versicolor_classify_species_svm directory is created there will be a file setosa_versicolor_classify_species_svm_params.py containing a template for experimental parameters that the user can change or add on to. The Scripts directory in the Iris directory should contain general purpose scripts that can take in specific experimental parameters from your different experiments—allowing you to easily change your experiment on the fly with minimal code change. Take a look at the Iris directory for an example of this workflow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orthrus-1.0.9.tar.gz (119.3 kB view details)

Uploaded Source

Built Distribution

orthrus-1.0.9-py3-none-any.whl (131.7 kB view details)

Uploaded Python 3

File details

Details for the file orthrus-1.0.9.tar.gz.

File metadata

  • Download URL: orthrus-1.0.9.tar.gz
  • Upload date:
  • Size: 119.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for orthrus-1.0.9.tar.gz
Algorithm Hash digest
SHA256 713b5d6b700820585ea81860ee9cfb32f7a34f67cf5ad0e58ab51d9c076f76ba
MD5 024ae3fdd64161be5ef9aaa582d4f3c9
BLAKE2b-256 79395cd236ee81e6a814a366848c28176dd1c2824a94e012f01a901772caedce

See more details on using hashes here.

File details

Details for the file orthrus-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: orthrus-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 131.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for orthrus-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 de232f54a044e3819e95f9f1eb12e1376ac2ad1141c39514f9a226abd6622db9
MD5 7770a660a89d2d872ae13f02910a5783
BLAKE2b-256 34640aa15b51a48a84d11dc980e9dffe4757f3654705a3da83716c1b00c16dff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page