Skip to main content

A python package for scaling and automating pre-processing, visualization, classification, and features selection of generic data sets.

Project description

orthrus

A python package for scaling and automating pre-processing, visualization, classification, and features selection of generic data sets. Read the docs!

Installing the conda environment

In order to ensure proper behavior of python classes and functions between platforms we recommend installing an isolated conda environment with the depedencies listed in environment.yml. To create a new enviroment with these dependencies, from the shell run:

conda env create -f environment.yml

This will generate the conda environment orthrus and install any dependencies required by the orthrus module. If the user does not have a CUDA >=11 compatible graphics card, then the user can replace environment.yml with environment_nocuda.yml. The user can also use their own environment and install the packages listed in either environment.yml or environment_nocuda.yml.

Installing the orthrus package

orthrus is now available through the PyPi just run

pip install orthrus

to install the orthrus package. To install the orthrus package from this repo, first activate the orthrus environment and then navigate to your local orthrus directory:

conda activate orthrus
cd /path/to/orthrus/

Install the package with pip

pip install -e .

Finally add ORTHRUS_PATH=/path/to/orthrus/ to your environment variables (different for each OS).

Basic Usage

The fundamental object in the orthrus package is the DataSet class. Here is an example of loading the iris dataset into the DataSet class to create an instance from within the orthrus directory:

# imports
from orthrus.core.dataset import DataSet as DS
import pandas as pd

# load data and metadata
data = pd.read_csv("test_data/Iris/Data/iris_data.csv", index_col=0)
metadata = pd.read_csv("test_data/Iris/Data/iris_metadata.csv", index_col=0)

# create DataSet instance
ds = DS(name='iris', path='./test_data', data=data, metadata=metadata)

# save dataset
ds.save()

here path indicates where ds will save figures and results output by the class methods.

Creating a Project Environment

To increase organization and reproducibility of results the orthrus package includes helper functions for generating a project directory and experiment subdirectories. Here is an example where we create a project directory called Iris and then generate an experiment directory called setosa_versicolor_classify_species_svm where we intend to classify setosa and versicolor species with an SVM classifier.

# imports
from orthrus.core.helper import generate_project
from orthrus.core.helper import generate_experiment
from orthrus.core.dataset import load_dataset
import shutil

# Create a project directory structure in the test path
file_path = './test_data/'
generate_project('Iris', file_path)

# move data into Data directory of Iris project directory
shutil.move('./test_data/iris.ds', './test_data/Iris/Data/iris.ds')

# create experiment directory in the Experiments directory of the Iris directory
proj_dir = './test_data/Iris/'
generate_experiment('setosa_versicolor_classify_species_svm', proj_dir)

Once the setosa_versicolor_classify_species_svm directory is created there will be a file setosa_versicolor_classify_species_svm_params.py containing a template for experimental parameters that the user can change or add on to. The Scripts directory in the Iris directory should contain general purpose scripts that can take in specific experimental parameters from your different experiments—allowing you to easily change your experiment on the fly with minimal code change. Take a look at the Iris directory for an example of this workflow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orthrus-1.0.2.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

orthrus-1.0.2-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file orthrus-1.0.2.tar.gz.

File metadata

  • Download URL: orthrus-1.0.2.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for orthrus-1.0.2.tar.gz
Algorithm Hash digest
SHA256 6708d42db9c9a5692b2cb8b2acdc4374b2ae68dcc505071ea42b122232fdf33b
MD5 b97f48146871cf38a8281c19592861c1
BLAKE2b-256 20979f78c54c5a49647aa9ce40f986eaabde3d22ac54618c9439ba7ec788ae2e

See more details on using hashes here.

File details

Details for the file orthrus-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: orthrus-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for orthrus-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d5ad3aa7fad60bd19016964249b3f9b251bc246081f0f04260a55ce984f78c04
MD5 6f7ae66d91f4770cc89a01f2b08da5c9
BLAKE2b-256 08879c21236cec773518c1e811669105d38959f6d09fb94b5f0d98429b4f2f08

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page