Skip to main content

module designed to make your data preprocessing experience easier

Project description

irdatacleaning

This python package is designed to make Artificial Intelligence accessible by starting with the data cleaning stage.

DataCorrelation:

this module allows you to be able to view the correlation values of your dataset allowing you the ability to prevent simple errors DataCorrelation(df = pandas dataframe) df: is where you will input the dataset you would like to evaluate Correlationmatrix(): is the method you call uppon to view which columns have correlation relationships. LookingAtCorr() is the method is where you will actually make the changes to your dataset this method returns a pandas dataframe. Check(): this method will call uppon both LookingAtCorr, and Correlationmatrix for you this method also will return a pandas dataframe.

DataDiscovery:

This class is designed to allow you the ability to evaluate your data so that you may get an idea of what you need to change in the dataset the best way to use this class is by actaully creating an instance of this class where it will automate everything. DataDiscovery(df) df will be any pandas dataframe you wish to evaluate.

Encoder

this class is dessigned to help you make encoding your data simple the input variables for this class are df: a pandas dataframe type: by defalult this variable will br set to ONEHOTENCODER if you with to use OrdinalEncoder you would set type to ordinalencoder then you can call the check method to make the corretions this method will return a pandas data frame. if you wish to compare the returned value to the original dataset you may call copy.

##FeatureScaler: this class is dessigned to make featur scaling very simple and begginer friendly. this class has 2 input arguments. FeatureScaler(df,checker=2) df: which is the dataset that you will be applying standard scaller checker is the threshold that your columns will be evluated at, by default this variable is set to 2 but you can change this depending on what you need.

InconsistentData

this class is dessigned to help you in the process of correcting inconsitent data you have the ability to use use, seperatingwords(origin,change): this method is created so that you will be able to make sure all the columns names with more then one word is seperated correctly origin is the original format used to seperate the words change is the format you would like to be used to seperate words changeing_column_cases(case = "title") this method is used to correct the columns nanes so that they are all in full caps, full lower, or title case case will be used to tell the method what case you would like by defalut case will be set equal to title but by saying case = upper the column names will be put to full lower and the same for case = upper column_names_white_space(): this method will be used to correct white space in column names data_white_space(): this method will be used to correct white space in the dataset correcting(column_name, corrections ): this method is dessigned to help you make the needed changes to the data in the cells so that your data is more consistent column_name is the var used to identify which column will get the corrections corrections is the dictionary with the corrected valuescheck(seperatingwords = False, origin = "", corrections = "" , change_case = False,case = "title", correcting = False, column_name = "", cell_corrections=None): this methode is designed to automate all the steps. needed except you will have to provide some input arguments first is seperatingwords by defalut is false when you set this to true you will be calling the seperatingwords words method therefore you will have to add what the origin is set equal as well as corrections these will both be some kind string values next input value will be case

change_case = False to be able to have all your column names changed to the same case you will want change the value of change_case to true case = "title" you can change this depending on how you would like to formate your column names when you want to correct specifica values in the data you will set correcting to true as well as column_name = to the column name that will get these corrections done then cell_corrections = to a dictionary the corrected pandas data frame will be return autocheck(): does the same as what check does but walks you through the proccess of making all the changes resources(): a method dessigned to give you links for more information on the class

MissingValues:

Is dessigned to make correcting missing values alot more accesable. MissingValues(df) df: is the inputted pandas dataframe what will have corrections made to it check is the method used to tell the module to start the corrections, this method will return the corrected dataframe if you wish to get the original dataframe call the copy variable. currently you are only able to use the median stratagy however other methods are in the work

StringToDateTime:

this class is designed to make converting strings to datetime more accessable this is done by creating an instance of the class StringToDateTime(df, column_names) df is where you will define the pandas dataframe that you will work with column_names is when you have a column names for columns you wish have converted to datetime that is not not ["date","dates","starttime","start_time","start time"], to use this input argument successfully you must pass in a list check(): to tell the module to make the corrections you must call the check method resources(): will give you the link to the youtube video about this module as well as the github

Resources:

this class is used to allow you islanders the ability to get additional resources on the module or classes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

irdatacleaning-2021.0.2.tar.gz (10.6 kB view hashes)

Uploaded Source

Built Distribution

irdatacleaning-2021.0.2-py3-none-any.whl (14.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page