Skip to main content

Data Science Library

Project description

polar

polar is a Python module that contains simple to use data science functions. It is built on top of SciPy, scikit-learn, seaborn and pandas.

Installation

If you already have a working installation of numpy and scipy, the easiest way to install parkitny is using pip:

pip install polar seaborn pandas scikit-learn scipy matplotlib numpy nltk -U

Dependencies

polar requires:

  • Python (>= 3.5)
  • NumPy (>= 1.11.0)
  • SciPy (>= 0.17.0)
  • Seaborn (>= 0.9.0)
  • scikit-learn (>= 0.21.3)
  • nltk (>= 3.4.5)
  • python-pptx (>= 0.6.18)
  • cryptography (> 2.8)
  • imblearn

Jupyter Notebook Examples

Here is the link to the jupyter notebook with all the exmples that are described below Polar-Examples

ACA (Automated Cohort Analysis) Example

The ACA creates three heatmaps for each feature in the data set.

  • Conversion heatmap - conversion per feature value
  • Distribution heatmap - distribution per feature value
  • Size heatmap - total samples per feature value

Data File: ACA_date.csv

Final Result Power Point: ACA.pptx

import pandas as pd
import polar as pl
from pptx import Presentation
%matplotlib inline

url = "https://raw.githubusercontent.com/pparkitn/imagehost/master/ACA_date.csv"
data_df=pd.read_csv(url)

prs = Presentation()    
pl.create_title(prs,'ACA')
for chart in pl.ACA_create_graphs(data_df,'date','label'):
    pl.add_chart_slide(prs,chart[0],chart[1])
pl.save_presentation(prs,filename = 'ACA')

Conversion: Image

Distribution: Image

Samples: Image

EDA Example

import pandas as pd
import openml
import polar as pl

dataset = openml.datasets.get_dataset(31)
X, y, categorical_indicator, attribute_names = \
dataset.get_data(target=dataset.default_target_attribute,dataset_format='dataframe')

openml_df = pd.DataFrame(X)
openml_df['target'] = y

data_df = pl.analyze_correlation(openml_df,'target')
pl.get_heatmap(data_df,'correlation_heat_map.png',1.1,14,'0.1f',0,100,5,5)

Image

data_df = pl.analyze_association(openml_df,'target',verbose=0)
pl.get_heatmap(data_df,'association_heat_map.png',1.1,12,'0.1f',0,100,10,10)

Image

print(pl.analyze_df(openml_df, 'target',10))

Image

data_df = pl.get_important_features(openml_df,'target')
pl.get_bar(data_df,'bar.png','Importance','Feature_Name')

Image

NLP Example

import nltk
nltk.download('wordnet')
import pandas as pd
import polar as pl
from cryptography.fernet import Fernet

url = "https://raw.githubusercontent.com/pparkitn/imagehost/master/test_real_or_not_from_kaggle.csv"
data_df=pd.read_csv(url)

data_df.drop(columns=['id','keyword','location'], inplace=True)
data_df.head(3)

Image

key = Fernet.generate_key()
data_df['text_encrypted'] =  data_df['text'].apply(pl.encrypt_df,args=(key,))
data_df['text_decrypted'] =  data_df['text_encrypted'].apply(pl.decrypt_df,args=(key,))

data_df['text_stem'] = data_df['text_decrypted'].apply(pl.nlp_text_process,args=('stem',))
data_df['text_stem_lem'] = data_df['text_stem'].apply(pl.nlp_text_process,args=('lem',))

data_df.head(3)

Image

cluster_df = pl.nlp_cluster(data_df, 'text_stem_lem',  10, 'text_cluster',1.0,1,100,1,'KMeans',(1,2))[0]
cluster_df.groupby(['text_cluster']).count()

Image

cluster_df[cluster_df['text_cluster']==9]['text_stem_lem']

Image

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polar-0.0.119.tar.gz (11.5 kB view details)

Uploaded Source

File details

Details for the file polar-0.0.119.tar.gz.

File metadata

  • Download URL: polar-0.0.119.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for polar-0.0.119.tar.gz
Algorithm Hash digest
SHA256 4b6a8debb127f5a921a5af2512cfba0bd8b5ef07f9b4ea1c15259eaefcc3d23d
MD5 0060cfb0ef942b1dd65d6f5742c4966c
BLAKE2b-256 d6c989edb0b7cd7e8de1c8dd68f3eed42b9332a818c19acb74140791f1bba5b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page