Skip to main content

A package for efortless data manager.

Project description

Data Forge

Data Forge is a Python package designed to provide a comprehensive suite of tools specialized in managing and analyzing data. It offers a user-friendly toolkit suitable for individuals of all levels of expertise. This initial release encompasses the following features:

  • Effortless reading of datafiles with support for Pandas-compatible extensions.
  • Simplified dataframe creation by specifying column names and corresponding values.
  • Automatic generation of hexadecimal and numeric identifiers.
  • Provision of common statistical analyses for specified columns.
  • Histogram plotting functionality.
  • Easy visualization of class distribution within the database.
  • Data standardization capabilities.
  • Feature discretization support.
  • Creation of fold columns for conducting stratified k-fold analyses.

Components

'PDBuilder' Class

This class enables the creation and manipulation of Pandas DataFrames. It supports the following functionalities:

  • Reading data from various file formats: CSV, Excel, JSON, Parquet, Feather, and Pickle.
  • Generating unique identifiers for rows.
  • Adding new data to the DataFrame.
  • Displaying the DataFrame.
  • Retrieving column names.
  • Printing specific columns.

'PDNumPro' Class

This class extends the PDBuilder class and provides additional functionalities for numerical analysis and preprocessing. It includes the following features:

  • Computing statistics for numerical columns.
  • Plotting histograms.
  • Analyzing data balance.
  • Standardizing numerical data.
  • Reconstructing data.
  • Converting non-numeric columns to numeric values.
  • Creating folds for cross-validation.

Dependencies

This toolkit requires the following dependencies:

  • pandas
  • numpy
  • matplotlib
  • scikit-learn

Installation

To install the dependencies, run:

pip install dforge

Usage

You can use this toolkit in your Python projects by importing the necessary classes and functions. Here's an example of how to use the PDBuilder class:

import dforge as df

# Create a DataFrame from data
data = [[1, 'A', 10], [2, 'B', 20], [3, 'C', 30]]
columns = ['ID', 'Category', 'Value']
data_pd = df.PDBuilder(data=data, columns=columns)

# Display DataFrame
data_pd.show_dataset()

# Add new data
new_data = [[4, 'D', 40], [5, 'E', 50]]
data_pd.add_data(new_data)

# Display DataFrame with added data
data_pd.show_dataset()

Work in progress...

Stay tuned for upcoming releases, which will incorporate the following enhancements:

  • Missing data completion functionality.
  • Conversion of dataframes into datasets suitable for machine learning inputs.
  • Introduction of PDTextAnalysis for handling features related to text manipulation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dforge-2.1.5.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dforge-2.1.5-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file dforge-2.1.5.tar.gz.

File metadata

  • Download URL: dforge-2.1.5.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.3

File hashes

Hashes for dforge-2.1.5.tar.gz
Algorithm Hash digest
SHA256 4ae9195638862deee0239c2b8ed13cac8490259e44d484cccabc92f83399fc32
MD5 6189140c2b00d7ca7f1f2d4719bd71d3
BLAKE2b-256 69e560c0d6297d83c15343ec8abfb61f80fb4e97f267259b54698bb63ef218bf

See more details on using hashes here.

File details

Details for the file dforge-2.1.5-py3-none-any.whl.

File metadata

  • Download URL: dforge-2.1.5-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.3

File hashes

Hashes for dforge-2.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 7f0d8224cc0e48b73f014359c8f97f31fa611a17a512708cbb2a385460b8c9ba
MD5 f0c623a2c0842daf19c2e12cf66f11f3
BLAKE2b-256 6b4c801b2cbeba54b7b60a9b468d4ccede589d6be3070f75f8bc2d90c113d3de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page