Skip to main content

Data Science Library

Project description

polar

polar is a Python module that contains simple to use data science functions. It is built on top of SciPy, scikit-learn, seaborn and pandas.

polar build

python setup.py sdist twine upload dist/*

Installation

If you already have a working installation of numpy and scipy, the easiest way to install parkitny is using pip:

pip install polar seaborn pandas scikit-learn scipy matplotlib numpy nltk -U

Dependencies

polar requires:

  • Python (>= 3.5)
  • NumPy (>= 1.11.0)
  • SciPy (>= 0.17.0)
  • Seaborn (>= 0.9.0)
  • scikit-learn (>= 0.21.3)
  • nltk (>= 3.4.5)
  • python-pptx (>= 0.6.18)
  • cryptography (> 2.8)
  • imblearn

Jupyter Notebook Examples

Here is the link to the jupyter notebook with all the exmples that are described below Polar-Examples

ACA (Automated Cohort Analysis) Example

The ACA creates three heatmaps for each feature in the data set.

  • Conversion heatmap - conversion per feature value
  • Distribution heatmap - distribution per feature value
  • Size heatmap - total samples per feature value

Data File: ACA_date.csv

Final Result Power Point: ACA.pptx

import pandas as pd
import polar as pl
from pptx import Presentation
%matplotlib inline

url = "https://raw.githubusercontent.com/pparkitn/imagehost/master/ACA_date.csv"
data_df=pd.read_csv(url)

prs = Presentation()    
pl.create_title(prs,'ACA')
for chart in pl.ACA_create_graphs(data_df,'date','label'):
    pl.add_chart_slide(prs,chart[0],chart[1])
pl.save_presentation(prs,filename = 'ACA')

Conversion: Image

Distribution: Image

Samples: Image

EDA Example

import pandas as pd
import openml
import polar as pl

dataset = openml.datasets.get_dataset(31)
X, y, categorical_indicator, attribute_names = \
dataset.get_data(target=dataset.default_target_attribute,dataset_format='dataframe')

openml_df = pd.DataFrame(X)
openml_df['target'] = y

data_df = pl.analyze_correlation(openml_df,'target')
pl.get_heatmap(data_df,'correlation_heat_map.png',1.1,14,'0.1f',0,100,5,5)

Image

data_df = pl.analyze_association(openml_df,'target',verbose=0)
pl.get_heatmap(data_df,'association_heat_map.png',1.1,12,'0.1f',0,100,10,10)

Image

print(pl.analyze_df(openml_df, 'target',10))

Image

data_df = pl.get_important_features(openml_df,'target')
pl.get_bar(data_df,'bar.png','Importance','Feature_Name')

Image

NLP Example

import nltk
nltk.download('wordnet')
import pandas as pd
import polar as pl
from cryptography.fernet import Fernet

url = "https://raw.githubusercontent.com/pparkitn/imagehost/master/test_real_or_not_from_kaggle.csv"
data_df=pd.read_csv(url)

data_df.drop(columns=['id','keyword','location'], inplace=True)
data_df.head(3)

Image

key = Fernet.generate_key()
data_df['text_encrypted'] =  data_df['text'].apply(pl.encrypt_df,args=(key,))
data_df['text_decrypted'] =  data_df['text_encrypted'].apply(pl.decrypt_df,args=(key,))

data_df['text_stem'] = data_df['text_decrypted'].apply(pl.nlp_text_process,args=('stem',))
data_df['text_stem_lem'] = data_df['text_stem'].apply(pl.nlp_text_process,args=('lem',))

data_df.head(3)

Image

cluster_df = pl.nlp_cluster(data_df, 'text_stem_lem',  10, 'text_cluster',1.0,1,100,1,'KMeans',(1,2))[0]
cluster_df.groupby(['text_cluster']).count()

Image

cluster_df[cluster_df['text_cluster']==9]['text_stem_lem']

Image

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polar-0.0.124.tar.gz (11.6 kB view details)

Uploaded Source

File details

Details for the file polar-0.0.124.tar.gz.

File metadata

  • Download URL: polar-0.0.124.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for polar-0.0.124.tar.gz
Algorithm Hash digest
SHA256 fd92b031901ae1a60c2a81150d3fc72354a98631e9f0e247a56cb91c243ff565
MD5 e7b2c212c9fde1f05baae25470a7f72f
BLAKE2b-256 2f143cdfbc1c7755370a3e3b2e23b9920eac0c11f43d5cd2a6467ce0f5057f3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page