A plotting and clustering package

## Project description

## This package allows you to:

- Create a scatterplot
- Plot the distribution of your data
- Group or cluster samples in your data

### 1. Creating a scatterplot:

To create a scatterplot, you have to run the **scattergraph** function from the **plotting.py** file, which takes the following arguments:

- x: this is the column that you want to plot on your x-axis. This should be a numpy array.
- y: this is the column that you want to plot on your y-axis. This should be a numpy array.
- xtitle: this will be the title of your x-axis. This should be a string.
- ytitle: this will be the title of your y-axis. This should be a string.
- graphtitle: this will be the title of your graph. This should be a string.
- outlier_treatment: this tells the graph how to visually differentiate outliers on your plot. You can choose one of the below options. This should be a string.

- "color": plots the outliers in a different color
- "shape": plots the outliers with a different marker
- "size": plots the outliers with a different size
*Note: this argument will default to size if any other string is passed*

- outlier_sensitivity: this is a multiplier in a customized IQR calculation, which ultiamately generates a sub-array of outliers. This should be a float (recommended between 0 and 2). If outlier_sensitivity is zero, then your outliers are in the 1st and 4th quartile of your data. The higher the outlier_sensitivity, the fewer the outliers.

### 2. Plotting your distributions:

To plot your distribution, you have to run the **plotdistribution** function from the **plotting.py** file, which takes the following arguments:

- y: this is the column that has your target data. This should be a numpy array.
- numberofbins: choose the number of bins for the histogram. The larger the data set, the more likely youï¿½ll want a large number of bins. This should be an int.
- plottitle: this will be the title of your graph. This should be a string.

### 3. Grouping/creating clusters:

To create clusters, you have to run the **create_clusters** function from the **clustering.py** file, which takes the following arguments:

- x: the columns that you want to use as a basis for clustering. This should be a numpy array.
- y: this is the column that has your target data. This should be a numpy array.
- numberofclusters: the number of clusters to form as well as the number of centroids to generate. This should be an int.

## Project details

## Release history Release notifications

## Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|

Filename, size claraw10-0.0.1-py3-none-any.whl (4.4 kB) | File type Wheel | Python version py3 | Upload date | Hashes View hashes |

Filename, size claraw10-0.0.1.tar.gz (3.0 kB) | File type Source | Python version None | Upload date | Hashes View hashes |