Skip to main content

A plotting and clustering package

Project description

This package allows you to:

  1. Create a scatterplot
  2. Plot the distribution of your data
  3. Group or cluster samples in your data

1. Creating a scatterplot:

To create a scatterplot, you have to run the scattergraph function from the plotting.py file, which takes the following arguments:

  1. x: this is the column that you want to plot on your x-axis. This should be a numpy array.
  2. y: this is the column that you want to plot on your y-axis. This should be a numpy array.
  3. xtitle: this will be the title of your x-axis. This should be a string.
  4. ytitle: this will be the title of your y-axis. This should be a string.
  5. graphtitle: this will be the title of your graph. This should be a string.
  6. outlier_treatment: this tells the graph how to visually differentiate outliers on your plot. You can choose one of the below options. This should be a string.
  • "color": plots the outliers in a different color
  • "shape": plots the outliers with a different marker
  • "size": plots the outliers with a different size
  • Note: this argument will default to size if any other string is passed
  1. outlier_sensitivity: this is a multiplier in a customized IQR calculation, which ultiamately generates a sub-array of outliers. This should be a float (recommended between 0 and 2). If outlier_sensitivity is zero, then your outliers are in the 1st and 4th quartile of your data. The higher the outlier_sensitivity, the fewer the outliers.

2. Plotting your distributions:

To plot your distribution, you have to run the plotdistribution function from the plotting.py file, which takes the following arguments:

  1. y: this is the column that has your target data. This should be a numpy array.
  2. numberofbins: choose the number of bins for the histogram. The larger the data set, the more likely you�ll want a large number of bins. This should be an int.
  3. plottitle: this will be the title of your graph. This should be a string.

3. Grouping/creating clusters:

To create clusters, you have to run the create_clusters function from the clustering.py file, which takes the following arguments:

  1. x: the columns that you want to use as a basis for clustering. This should be a numpy array.
  2. y: this is the column that has your target data. This should be a numpy array.
  3. numberofclusters: the number of clusters to form as well as the number of centroids to generate. This should be an int.

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for claraw10, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size claraw10-0.0.1-py3-none-any.whl (4.4 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size claraw10-0.0.1.tar.gz (3.0 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page