Skip to main content

A package for efortless data manager.

Project description

Data Forge

Data Forge is a Python package designed to provide a comprehensive suite of tools specialized in managing and analyzing data. It offers a user-friendly toolkit suitable for individuals of all levels of expertise. This initial release encompasses the following features:

  • Effortless reading of datafiles with support for Pandas-compatible extensions.
  • Simplified dataframe creation by specifying column names and corresponding values.
  • Automatic generation of hexadecimal and numeric identifiers.
  • Provision of common statistical analyses for specified columns.
  • Histogram plotting functionality.
  • Easy visualization of class distribution within the database.
  • Data standardization capabilities.
  • Feature discretization support.
  • Creation of fold columns for conducting stratified k-fold analyses.

Components

'PDBuilder' Class

This class enables the creation and manipulation of Pandas DataFrames. It supports the following functionalities:

  • Reading data from various file formats: CSV, Excel, JSON, Parquet, Feather, and Pickle.
  • Generating unique identifiers for rows.
  • Adding new data to the DataFrame.
  • Displaying the DataFrame.
  • Retrieving column names.
  • Printing specific columns.

'PDNumPro' Class

This class extends the PDBuilder class and provides additional functionalities for numerical analysis and preprocessing. It includes the following features:

  • Computing statistics for numerical columns.
  • Plotting histograms.
  • Analyzing data balance.
  • Standardizing numerical data.
  • Reconstructing data.
  • Converting non-numeric columns to numeric values.
  • Creating folds for cross-validation.

Dependencies

This toolkit requires the following dependencies:

  • pandas
  • numpy
  • matplotlib
  • scikit-learn

Installation

To install the dependencies, run:

pip install dforge

Usage

You can use this toolkit in your Python projects by importing the necessary classes and functions. Here's an example of how to use the PDBuilder class:

import dforge as df

# Create a DataFrame from data
data = [[1, 'A', 10], [2, 'B', 20], [3, 'C', 30]]
columns = ['ID', 'Category', 'Value']
data_pd = df.PDBuilder(data=data, columns=columns)

# Display DataFrame
data_pd.show_dataset()

# Add new data
new_data = [[4, 'D', 40], [5, 'E', 50]]
data_pd.add_data(new_data)

# Display DataFrame with added data
data_pd.show_dataset()

Work in progress...

Stay tuned for upcoming releases, which will incorporate the following enhancements:

  • Missing data completion functionality.
  • Conversion of dataframes into datasets suitable for machine learning inputs.
  • Introduction of PDTextAnalysis for handling features related to text manipulation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dforge-2.1.5.tar.gz (22.7 kB view hashes)

Uploaded Source

Built Distribution

dforge-2.1.5-py3-none-any.whl (22.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page