Skip to main content

A plotting and clustering package

Project description

This package allows you to:

  1. Create a scatterplot
  2. Plot the distribution of your data
  3. Group or cluster samples in your data

1. Creating a scatterplot:

To create a scatterplot, you have to run the scattergraph function from the plotting.py file, which takes the following arguments:

  1. x: this is the column that you want to plot on your x-axis. This should be a numpy array.
  2. y: this is the column that you want to plot on your y-axis. This should be a numpy array.
  3. xtitle: this will be the title of your x-axis. This should be a string.
  4. ytitle: this will be the title of your y-axis. This should be a string.
  5. graphtitle: this will be the title of your graph. This should be a string.
  6. outlier_treatment: this tells the graph how to visually differentiate outliers on your plot. You can choose one of the below options. This should be a string.
  • "color": plots the outliers in a different color
  • "shape": plots the outliers with a different marker
  • "size": plots the outliers with a different size
  • Note: this argument will default to size if any other string is passed
  1. outlier_sensitivity: this is a multiplier in a customized IQR calculation, which ultiamately generates a sub-array of outliers. This should be a float (recommended between 0 and 2). If outlier_sensitivity is zero, then your outliers are in the 1st and 4th quartile of your data. The higher the outlier_sensitivity, the fewer the outliers.

2. Plotting your distributions:

To plot your distribution, you have to run the plotdistribution function from the plotting.py file, which takes the following arguments:

  1. y: this is the column that has your target data. This should be a numpy array.
  2. numberofbins: choose the number of bins for the histogram. The larger the data set, the more likely you�ll want a large number of bins. This should be an int.
  3. plottitle: this will be the title of your graph. This should be a string.

3. Grouping/creating clusters:

To create clusters, you have to run the create_clusters function from the clustering.py file, which takes the following arguments:

  1. x: the columns that you want to use as a basis for clustering. This should be a numpy array.
  2. y: this is the column that has your target data. This should be a numpy array.
  3. numberofclusters: the number of clusters to form as well as the number of centroids to generate. This should be an int.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claraw10-0.0.1.tar.gz (3.0 kB view hashes)

Uploaded Source

Built Distribution

claraw10-0.0.1-py3-none-any.whl (4.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page