Initial Publish of preprocessing package from PGP-CADS-FTEL
Reason this release was yanked:
Unable to load data file
Project description
preprocessing_pgp
preprocessing_pgp -- The Preprocessing library for any kind of data -- is a suit of open source Python modules, preprocessing techniques supporting research and development in Machine Learning. preprocessing_pgp requires Python version 3.6, 3.7, 3.8, 3.9, 3.10
Installation
To install the current release:
$pip install preprocessing-pgp
Example
1. Preprocessing Name
$python
>>> import preprocessing_pgp as pgp
>>> pgp.preprocess.basic_preprocess_name('Phan Thị Thúy Hằng *$%!@#')
Phan Thị Thúy Hằng
1. Extracting Phones
$python
>>> import pandas as pd
>>> from preprocessing_pgp.phone.extractor import extract_valid_phone
>>> data = pd.read_parquet('/path/to/data.parquet')
>>> extract_valid_phone(phones=data, phone_col='col_contains_phone')
# OF PHONE CLEAN : 0
Sample of non-clean phones:
Empty DataFrame
Columns: [id, phone, clean_phone]
Index: []
100%|██████████| ####/#### [00:00<00:00, ####it/s]
# OF PHONE 10 NUM VALID : ####
# OF PHONE 11 NUM VALID : ####
0it [00:00, ?it/s]
# OF OLD PHONE CONVERTED : ####
# OF OLD REGION PHONE : ####
100%|██████████| ####/#### [00:00<00:00, ####it/s]
# OF VALID PHONE : ####
# OF INVALID PHONE : ####
Sample of invalid phones:
+--------+---------+-------------+---------------+------------------+--+
| | id | phone |clean_phone | is_phone_valid | phone_convert |
+========+=========+=============+===============+==================+==+
| 0 | #### | 090#### | #### | False | |
+--------+---------+-------------+---------------+------------------+--+
| 1 | #### | 091#### | #### | False | |
+--------+---------+-------------+---------------+------------------+--+
| 2 | #### | 009#### | #### | False | |
+--------+---------+-------------+---------------+------------------+--+
| 3 | #### | 080#### | #### | False | |
+--------+---------+-------------+---------------+------------------+--+
| 4 | #### | 012#### | #### | False | |
+--------+---------+-------------+---------------+------------------+--+
| 5 | #### | 023#### | #### | False | |
+--------+---------+-------------+---------------+------------------+--+
| 6 | #### | 023#### | #### | False | |
+--------+---------+-------------+---------------+------------------+--+
| 7 | #### | 023#### | #### | False | |
+--------+---------+-------------+---------------+------------------+--+
| 8 | #### | 023#### | #### | False | |
+--------+---------+-------------+---------------+------------------+--+
| 9 | #### | 023#### | #### | False | |
+--------+---------+-------------+---------------+------------------+--+
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
preprocessing-pgp-0.1.0.tar.gz
(14.9 kB
view hashes)
Built Distribution
Close
Hashes for preprocessing_pgp-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20168c01232531ffe74af5cdeed1186691c6028fc66fb97db4e3e1b3071fe775 |
|
MD5 | 69777a97b0993cc3cc63daff4c9578fa |
|
BLAKE2b-256 | 9d4e03093e3da289d311805e75684e333489a00f508e93bc533570278aec9bee |