Skip to main content

Data Science Library

Project description

polar

polar is a Python module that contains simple to use data science functions. It is built on top of SciPy, scikit-learn, seaborn and pandas.

polar build

python setup.py sdist twine upload dist/*

Installation

If you already have a working installation of numpy and scipy, the easiest way to install parkitny is using pip:

pip install polar seaborn pandas scikit-learn scipy matplotlib numpy nltk -U

Dependencies

polar requires:

  • Python (>= 3.5)
  • NumPy (>= 1.11.0)
  • SciPy (>= 0.17.0)
  • Seaborn (>= 0.9.0)
  • scikit-learn (>= 0.21.3)
  • nltk (>= 3.4.5)
  • python-pptx (>= 0.6.18)
  • cryptography (> 2.8)
  • imblearn

Jupyter Notebook Examples

Here is the link to the jupyter notebook with all the exmples that are described below Polar-Examples

ACA (Automated Cohort Analysis) Example

The ACA creates three heatmaps for each feature in the data set.

  • Conversion heatmap - conversion per feature value
  • Distribution heatmap - distribution per feature value
  • Size heatmap - total samples per feature value
import pandas as pd
import polar as pl
from pptx import Presentation
%matplotlib inline

url = "https://raw.githubusercontent.com/pparkitn/imagehost/master/ACA_date.csv"
data_df=pd.read_csv(url)

prs = Presentation()    
pl.create_title(prs,'ACA')
for chart in pl.ACA_create_graphs(data_df,'date','label'):
    pl.add_chart_slide(prs,chart[0],chart[1])
pl.save_presentation(prs,filename = 'ACA')

Conversion: Image

Distribution: Image

Samples: Image

EDA Example

import pandas as pd
import openml
import polar as pl

dataset = openml.datasets.get_dataset(31)
X, y, categorical_indicator, attribute_names = \
dataset.get_data(target=dataset.default_target_attribute,dataset_format='dataframe')

openml_df = pd.DataFrame(X)
openml_df['target'] = y

data_df = pl.analyze_correlation(openml_df,'target')
pl.get_heatmap(data_df,'correlation_heat_map.png',1.1,14,'0.1f',0,100,5,5)

Image

data_df = pl.analyze_association(openml_df,'target',verbose=0)
pl.get_heatmap(data_df,'association_heat_map.png',1.1,12,'0.1f',0,100,10,10)

Image

print(pl.analyze_df(openml_df, 'target',10))

Image

data_df = pl.get_important_features(openml_df,'target')
pl.get_bar(data_df,'bar.png','Importance','Feature_Name')

Image

NLP Example

import nltk
nltk.download('wordnet')
import pandas as pd
import polar as pl
from cryptography.fernet import Fernet

url = "https://raw.githubusercontent.com/pparkitn/imagehost/master/test_real_or_not_from_kaggle.csv"
data_df=pd.read_csv(url)

data_df.drop(columns=['id','keyword','location'], inplace=True)
data_df.head(3)

Image

key = Fernet.generate_key()
data_df['text_encrypted'] =  data_df['text'].apply(pl.encrypt_df,args=(key,))
data_df['text_decrypted'] =  data_df['text_encrypted'].apply(pl.decrypt_df,args=(key,))

data_df['text_stem'] = data_df['text_decrypted'].apply(pl.nlp_text_process,args=('stem',))
data_df['text_stem_lem'] = data_df['text_stem'].apply(pl.nlp_text_process,args=('lem',))

data_df.head(3)

Image

cluster_df = pl.nlp_cluster(data_df, 'text_stem_lem',  10, 'text_cluster',1.0,1,100,1,'KMeans',(1,2))[0]
cluster_df.groupby(['text_cluster']).count()

Image

cluster_df[cluster_df['text_cluster']==9]['text_stem_lem']

Image

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polar-0.0.127.tar.gz (11.5 kB view details)

Uploaded Source

File details

Details for the file polar-0.0.127.tar.gz.

File metadata

  • Download URL: polar-0.0.127.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.29.0 rfc3986/1.5.0 tqdm/4.66.1 urllib3/1.26.16 CPython/3.10.12

File hashes

Hashes for polar-0.0.127.tar.gz
Algorithm Hash digest
SHA256 8c48119336cbae1823fed4c14be0021f0d7cf743ee335050c853b9dc89948f3a
MD5 1889ac9144ff3cb966de8edccea3db0c
BLAKE2b-256 284ce8a80b009a26d9af2b2516b93c00fbed72433ee0dea5832f38e67c0688cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page