This is a sub modular package for encoding
Project description
Deep Air Encoder
This package is used for encoding data fields for machine compliant dataframes.
Package structure
deepair_encoder
├── encoder.py ├── init.py └── utils ├── encoder_tools.py ├── init.py ├── logger.py └── splitters.py
1 directory, 6 files
Dependencies
Note: The following python3 packages are necessary for this package to run:
- numpy
- scipy
- pandas
- sklearn
- tabulate
- tqdm
Function Declarations
Here are the signatures for the functions in the package that can be used for deepair-dev.
encoder.py
Below are the functions that can be accessed by importing this module as from deepair_encoder.encoder import <function_name>.
encode_username:
def encode_username(df, drop_field=True):
'''
Username one-hot encoder.
inputs:
df: Dataframe which have username column (pandas df series)
drop_field: a flag if the usename column should be dropped or not after encoding (bool)
return:
df: Dataframe which have only username indicator [0/1] (pandas df series)
'''
encode_discounts:
def encode_discounts(df, drop_field=True):
'''
Discounts encoder.
inputs:
df: dataframe which has discounts column (pandas df series)
drop_field: a flag if the discounts column should be dropped or not after encoding (bool)
return:
df: a dataframe with 3 new columns 'PROMOCODE', 'RES', 'LFG' and discounts droped if drop_field = True
'''
minmax_score:
def minmax_score(df, fields, keyval=['requestid', 'direction_onward'], drop_field=True, verbose=True):
'''
scorer function to normalize from 0-1 inversely to the value.
inputs:
df: Dataframe which contains fields (pandas df series)
fields: Dataframe columns on which the score is calculated (list)
keyval: Group the data by this key values (list)
drop_field: Indicator to replace fields by new scored fields (bool)
verbose: Progress bar indicator (bool)
return:
field: Dataframe with a score for totalprice relative to whole requestid-direction, by index, by faregroup (pandas df series)
'''
minmax_normalize:
def minmax_normalize(df, fields, keyval=['requestid', 'direction_onward'], drop_field=True, verbose=True):
'''
normalizer function to normalize from 0-1 proportional to the value.
inputs:
df: Dataframe which contains fields (pandas df series)
fields: Dataframe columns on which the normalication is calculated (list)
keyval: Group the data by this key values (list)
drop_field: Indicator to replace fields by normalized new fields (bool)
verbose: Progress bar indicator (bool)
return:
field: Dataframe with a score for totalprice relative to whole requestid-direction, by index, by faregroup (pandas df series)
'''
encode_totalprice:
def encode_totalprice(df):
'''
total price encoder.
inputs:
df: Dataframe which have totalprice and totaltaxes column (pandas df series)
return:
df: Dataframe with a score for totalprice relative to whole requestid-direction, by index, by faregroup (pandas df series)
'''
encode_bookingid:
def encode_bookingid(df):
'''
Bookingid flag one-hot encoder.
inputs:
df: Dataframe which have Bookingid column (pandas df series)
return:
df: Dataframe which have only Bookingid indicator [0/1] (pandas df series)
'''
encode_faregroup:
def encode_faregroup(raw_file, df, verbose=False):
'''
Faregroup encoder.
inputs:
raw_file: raw_file location where the faregroup_definition is available
df: Dataframe which have faregroup column (pandas df series)
verbose: Optional, if more detailed log is needed
return:
field: Dataframe with extra fields with faregroup attributes [-1/0/1] (pandas df series)
Note: -1 = not available, 0 = available for a fee, 1 = available for free (at no charge)
'''
encode_datetime:
def encode_datetime(df, field, verbose=False):
'''
Datetime encoder.
inputs:
df: Dataframe which have 'field' column (pandas df series)
field: the field that has to be encoded
return:
df: Dataframe with 6 ecoded values capturing TOD, DOW and WOY
'''
encode_advanced_purchase:
def encode_advanced_purchase(df, dptr_field='departuredate', sales_field='utctimestamp'):
'''
advanced purchase encoder.
inputs:
df: Dataframe which have dptr_field and sales_field column (pandas df series)
return:
df: Dataframe advanced purchase column
'''
encode_los:
def encode_los(df, verbose=False):
'''
los and trip_type encoder.
inputs:
df: Dataframe which have departuredate, requestid and direction column (pandas df series)
return:
df: Dataframe los and trip_type encodes column
'''
encode_airports:
def encode_airports(df, fields, reference_file_path, verbose=True):
'''
encode the airports.
inputs:
df : Dataframe which have airports (pandas df series)
fields : list of fields that you want to run this
function on (list)
reference_file_path : path to the codes file (string)
verbose : Indicator for progress bar (bool)
return:
df: Dataframe encoded with airports values
'''
encode_city:
def encode_city(df, fields, reference_file_path, verbose=True):
'''
encode the airports.
inputs:
df : Dataframe which have city (pandas df series)
fields : list of fields that you want to run this
function on (list)
reference_file_path : path to the codes file (string)
verbose : Indicator for progress bar (bool)
return:
df: Dataframe encoded with airports values
'''
utils
This subpackage contains tools a level lower than encoder module i.e. those modules that are used by encoder.
encoder_tools
Below are the functions that can be accessed by importing this module as from deepair_encoder.utils.encoder_tools import <function_name>.
one_hot_encoder:
def one_hot_encoder(df, fields, fields_drop=True, verbose=True, classes=None):
'''
Converts the fields into one hot encoding.
inputs:
df: Dataframe containing all those fields (pandas df)
fields: Dataframe columns that you want to convert to one hot (string)
verbose: Indicator for progress bar (bool)
classes: Dictionary for the desired columns (dict)
return:
df: updated dataframe (pandas df)
'''
integer_encoder:
def integer_encoder(df, fields, verbose=True):
'''
Converts the fields into integer encoding.
inputs:
df: Dataframe containing all those fields (pandas df)
fields: Dataframe columns that you want to convert to one hot (string)
verbose: Indicator for progress bar (bool)
returns:
df: updated dataframe (pandas df)
'''
get_wom:
def get_wom(field):
'''
Function for week of the month.
inputs:
field: Dataframe column which have timestamp (pandas df series)
return:
field: Dataframe column which have wom (pandas df series)
'''
get_dow:
def get_dow(field):
'''
Function for day of the week.
inputs:
field: Dataframe column which have timestamp (pandas df series)
return:
field: Dataframe column which have dow (pandas df series)
'''
get_month:
def get_month(field):
'''
Function for month.
inputs:
field: Dataframe column which have timestamp (pandas df series)
return:
field: Dataframe column which have only month (pandas df series)
'''
get_year:
def get_year(field):
'''
Function for year.
inputs:
field: Dataframe column which have timestamp (pandas df series)
return:
field: Dataframe column which have only year (pandas df series)
'''
obj2num:
def obj2num(df, fields, verbose=True):
'''
Converts the fields into numeric from obj data type.
inputs:
df: Dataframe containing all those fields (pandas df)
fields: Dataframe columns that you want to convert to numeric (string)
verbose: Indicator for progress bar (bool)
returns:
df: updated dataframe (pandas df)
'''
logger
Below are the functions that can be accessed by importing this module as from deepair_encoder.utils.logger import <function_name>.
unique_stats:
def unique_stats(df):
_log_with_timestamp:
def _log_with_timestamp(message):
'''
prints message on console
input :
message : msg to print (string)
'''
splitters
Below are the functions that can be accessed by importing this module as from deepair_encoder.utils.splitters import <function_name>.
type_splitter:
def type_splitter(data, keys=['passengertypes'], newheadings=[['ADT', 'CHD', 'INF']]):
'''
Split the column (need to make it more generic).
inputs:
data: dataframe which contain this column (pd dataframe)
keys: list of column fields to split(list)
newheadings: list of list for header columns in keys (list)
returns:
data: updated dataframe (pandas df)
'''
split_datewise:
def split_datewise(df, directory='', postfix='', mode='w'):
'''
Split a Dataframe according to dates in UTC-Timestamp:
input :-
DF : dataframe which contain this column (pd dataframe)
directory : path to the target directory (string)
postfix : any postfix you want to add (string)
mode : writing mode ('w':writing, 'a': append)
'''
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deepair_encoder-0.0.9.tar.gz.
File metadata
- Download URL: deepair_encoder-0.0.9.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d37c0d7352bbf050dcf1c8e5b40e8cfd1ba1089dc61985cf4e0fd7b21141200b
|
|
| MD5 |
9d4a24fea4e2ea6bc7f8f95ae21faf5f
|
|
| BLAKE2b-256 |
0effa205eb3ac657fb942851ab622f19ccdf8ec22c1ce5ae34bfb4cbdbc5f308
|
File details
Details for the file deepair_encoder-0.0.9-py3-none-any.whl.
File metadata
- Download URL: deepair_encoder-0.0.9-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1840318313ef8e3c905babf58fe880eb690dae4667ca80ae6fd754ebe405433
|
|
| MD5 |
22e062670699c1079432a8b45be796e2
|
|
| BLAKE2b-256 |
7e4696a660323418a18ffd9050b1e82049f59fb11ae194c73a750559bdad4aef
|