Skip to main content

Prda contains packages for data processing, analysis and visualization. The ultimate goal is to fill the “last mile” between analysts and packages.

Project description

prda

Prda contains packages for data processing, analysis and visualization.

Prda ultimate goal is to fill the “last mile” between analysts and packages. During my research practice, I have felt how “learning a package before utilizing” can be time-consuming and exhausting. The resulted inefficiency leads to the creation of prda.

Usage

pip install prda

See details in: https://pypi.org/project/prda/

You are welcome to clone prda for personal use and pull request of your modification is super!! encouraged.


To utilize prda, you only need to be familiar with pandas as most inputs is pd.DataFrame.

Currently with the help of ChatGPT, you can just tailor the input of demonstration code below to your data. And you don't need to be familiar with pandas or even python.

Examples of Useage

  1. For Visulization

import prda
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.array([np.arange(100) for i in range(5)]).T,columns=['a', 'b', 'c', 'd', 'e'])
prda.graphic.scatter_3d_html(df, x='a', y='b', z='c', color_hue='d', size_hue='e', title='demo_3d_scatter', filepath='demo_3d_scatter.html')

the above code will provide an interactive html figure that look like this:

Image.png

demo_3d_scatter.html


import prda
import pandas as pd
import numpy as np
datalen = 500
indices = np.arange(datalen)
col_a = np.arange(0, 10, 10/datalen)
col_b = np.random.randint(3, 8, datalen)
data = np.array([indices, col_a, col_b]).T
df = pd.DataFrame(data=data, columns=['idx', 'a', 'b'])

# draw
import random
point_markers = {
    'a': [(indices[i], col_a[i]) for i in random.sample(list(indices), 20)]
}
prda.graphic.lineplot_html(df, x='idx', y=['a', 'b'], markpoints=point_markers, filepath='demo_lineplot.html')
idx a b
0 0.0 0.00 6.0
1 1.0 0.02 3.0
2 2.0 0.04 4.0
... ... ... ...
498 498.0 9.96 6.0
499 499.0 9.98 5.0

And code with the above DataFrame will draw anther plot look like this:

lineplot_screenshot.png

demo_lineplot.html


  1. For Data Preparation

Code for filtering continuous variables in data with unique-value threshold of 5:

from prda import prep
prep.select_continuous_variables(data, unique_threshold=5)
  1. For Machine Learning

Code for evaluating hyperparameters combinations for a given algorithm using user-specified cross-validation method:

from prda.ml import evaluations
param_grid = {'k': [4,5,6,7]}
evaluations.evaluate_param_combinations(X, y, knn_algorithm, param_grid=param_grid, cv=10, visualize_results=True)
  1. For IO

A common usage during my research practice is to make well structured folders to save experimential results. With the following function, you only need to think about how you want your files to be structured. All related folders will be created automatically:

from prda import iostream
iostream.create_dirs([
    'results/experiment1/f1_score.csv',
    'results/experiment1/accuracy.csv',
    'results/experiment2/',
    'results/experiment10/accuracy/',
    'results/experiment10/f1_score/r1.txt',
    ])

The above one-line code will create all the folders for you which will have the corresponding structure below, after which you can then store your results without worrying about file structures whatsoever.

results/
├── experiment1/
│   ├── f1_score.csv
│   └── accuracy.csv
├── experiment2/
└── experiment10/
    ├── accuracy/
    └── f1_score/
        └── r1.txt

The prda's methods are quite self-explanatory, as a result, we think providing the above demonstration is suffice at the moment. Although the current prda is far from completion, let along perfection. It is under improvement regularly.


Updates

2023.5.3 Major Updates

Add several easy-to-use functions, including prep::pca, select_continuous_variables, handle_missing_data, apply_linear_func(row-wisely), and ml::match_clusters, evaluate_param_combinations(optimal parameters searching, with base class::sklearn.base.BaseEstimator), etc.

2023.11.10 Major Updates

  1. Including a variant of kNN which allows you to allocate customized k (K sequence) for each sample in ml::neighbors::VariableKNN. The algorithm behaves as a sklearn.classifier which means you can employ it directly via fit(·) and predict(·).
  2. Add functions, e.g. iostream::create_dirs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prda-1.2.0.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prda-1.2.0-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file prda-1.2.0.tar.gz.

File metadata

  • Download URL: prda-1.2.0.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for prda-1.2.0.tar.gz
Algorithm Hash digest
SHA256 2c429c5cd8073fb963dde484a8c75f22d22c9bd8c957718272caf05b454c03e6
MD5 e6d4335fcd76c43fb407caea36e86fb3
BLAKE2b-256 ad3f371beaf074c8aad86f1922f6feeba97b97e8cf370542d4cbec2032f94cd4

See more details on using hashes here.

File details

Details for the file prda-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: prda-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for prda-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ede07b87bb415b50af20fcc7aefa58618578b45e853f65bbd436894eb05e8428
MD5 41387e0b942d1f3ba4ed08e1e7982f6e
BLAKE2b-256 6fdd78f1d44213357bc2a5290cc31a89ce6ab5e80560acab23104ad2622db880

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page