Skip to main content

Useful Bamboo stuff.

Project description

BambooTools

BambooTools is Python library designed to enhance your data analysis workflows. Built as an extension to the widely-used pandas library, BambooTools provides one liner methods for outlier detection and completeness summary in pandas datasets.

With BambooTools, you can easily identify and handle outliers in your data, enabling more accurate analyses and predictions. The library also offers a completeness summary feature, which provides a quick and efficient way to assess the completeness of your dataset.

Installation

pip install git+https://github.com/KwstasMCPU/BambooTools

OR simple download the project and:

pip install . 

Usage

Completeness summary

completeness() retuns a completeness summary table, stating the percentage and count of complete (not NULL) values:

from bambootools import bambootools
import pandas as pd
import numpy as np

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
                              'Parrot', 'Parrot',
                              'Lama', 'Falcon'],
                   'Max Speed': [380, 370,
                                 24, 26,
                                 np.nan, np.nan],
                   'Weight': [np.nan, 2,
                              1.5, np.nan,
                              80, 2.2]
                   })
# check the completeness of the dataset per column
print(df.bbt.completeness())
perc count total
Animal 1.0 6 6
Max Speed 0.6666666666666666 4 6
Weight 0.6666666666666666 4 6

Specifying a list of categorical columns would result the completeness per category:

# check the completeness of the datataset per category
print(df.bbt.completeness(by=['Animal']))
Max Speed Weight
Animal perc count total perc count total
Falcon 0.666666667 2 3 0.666666667 2 3
Lama 0 0 1 1 1 1
Parrot 1 2 2 0.5 1 2

Outlier summary

outlier_summary() retuns a summary of the outliers found in the dataset based on a specific method (eg. IQR). It returns the number of outliers below and above the boundaries calculated by the specific method.

penguins = sns.load_dataset("penguins")
# identify outliers using the  Inter Quartile Range approach
print(penguins.bbt.outlier_summary('iqr', factor=1))
n_outliers_upper n_outliers_lower n_non_outliers n_total_outliers total_records
bill_depth_mm 0 0 342 0 342
bill_length_mm 2 0 340 2 342
body_mass_g 4 0 338 4 342
flipper_length_mm 0 0 342 0 342

You can also get the summary per group:

# outliers per category
print(penguins.bbt.outlier_summary(method='iqr', by=['sex', 'species'], factor=1))
n_non_outliers n_outliers_lower n_outliers_upper n_total_outliers total_records
('Female', 'Adelie') bill_depth_mm 71 1 1 2 73
('Female', 'Adelie') bill_length_mm 71 1 1 2 73
('Female', 'Adelie') body_mass_g 73 0 0 0 73
('Female', 'Adelie') flipper_length_mm 65 5 3 8 73
('Female', 'Chinstrap') bill_depth_mm 33 0 1 1 34
('Female', 'Chinstrap') bill_length_mm 23 5 6 11 34
('Female', 'Chinstrap') body_mass_g 31 2 1 3 34
('Female', 'Chinstrap') flipper_length_mm 33 1 0 1 34
('Female', 'Gentoo') bill_depth_mm 57 0 1 1 58
('Female', 'Gentoo') bill_length_mm 57 0 1 1 58
('Female', 'Gentoo') body_mass_g 57 1 0 1 58
('Female', 'Gentoo') flipper_length_mm 56 1 1 2 58
('Male', 'Adelie') bill_depth_mm 64 3 6 9 73
('Male', 'Adelie') bill_length_mm 65 3 5 8 73
('Male', 'Adelie') body_mass_g 73 0 0 0 73
('Male', 'Adelie') flipper_length_mm 67 4 2 6 73
('Male', 'Chinstrap') bill_depth_mm 33 1 0 1 34
('Male', 'Chinstrap') bill_length_mm 32 0 2 2 34
('Male', 'Chinstrap') body_mass_g 29 2 3 5 34
('Male', 'Chinstrap') flipper_length_mm 32 1 1 2 34
('Male', 'Gentoo') bill_depth_mm 56 2 3 5 61
('Male', 'Gentoo') bill_length_mm 51 5 5 10 61
('Male', 'Gentoo') body_mass_g 59 1 1 2 61
('Male', 'Gentoo') flipper_length_mm 59 2 0 2 61

Outlier boundaries

outlier_bounds() returns the boundary values which any value below or above is considered an outlier:

print(penguins.bbt.outlier_bounds(method='iqr',
                                  by=['sex', 'species'],
                                  factor=1))
bill_length_mm bill_length_mm bill_depth_mm bill_depth_mm flipper_length_mm flipper_length_mm body_mass_g body_mass_g
lower upper lower upper lower upper lower upper
sex species
Female Adelie 33 41.7 15.7 19.6 179 197 2800 3925
Female Chinstrap 43.475 49.325 15.95 19.1 178.75 204.25 3031.25 4025
Female Gentoo 40.825 49.9 13 15.4 205 220 4050 5287.5
Male Adelie 36.5 44 17.4 20.7 181 205 3300 4800
Male Chinstrap 48.125 53.9 17.8 20.8 189 210 3362.5 4468.75
Male Gentoo 45.7 52.9 14.3 17 211 232 4900 6100

Contributing

Contributions are welcome! Contribution guidelines are pending.

Credits

Special thanks to danikavu for the code reviews

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BambooTools-0.2.0.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

BambooTools-0.2.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file BambooTools-0.2.0.tar.gz.

File metadata

  • Download URL: BambooTools-0.2.0.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for BambooTools-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7b93dc46278dd904e22509f2d9b1df8290ccdd6170107cd564dd7bdc093b964f
MD5 699b3d29ab39967701b760c40f5b7fec
BLAKE2b-256 dc1bd8035c34e84f1f67e33ec1ebe2f0db5ad7dce7f93883aff1e8731edccbd2

See more details on using hashes here.

Provenance

File details

Details for the file BambooTools-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: BambooTools-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for BambooTools-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 63a9b7ac9f97e851c7c749d97693e03fb53f1600b0f89ffe8086fd60410ccfb6
MD5 a51175de622e70d1e27b2417f758f5fd
BLAKE2b-256 81329c70e53b13c502750348ed31b425c0ece45f1d0ed2d9f6de39b1e4db2caa

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page