Skip to main content

A package for feature extraction, hyperopt, and validation schemas

Project description

Data Science: Sales Prediction

Project Status: Completed

Overview:

This project provides a Python package,future_sales_prediction_2024, to simplify common tasks in data science workflows. It includes tools for feature extraction, validation schema creation, hyperparameter optimization, and model training.

Methods Used

  • Feature Engineering: Automates the creation and selection of important features, including memory optimization.
  • Validation: Implements schema validation to ensure data consistency, identify missing values, and prevent duplicate records.
  • Hyperparameter Tuning: Leverages tools like hyperopt for efficient parameter search.
  • Visualization: Includes plotting tools for feature importance and error analysis.

Technologies

  • Python
  • Pandas, jupyter

Data Sources:

The tools in this package are designed to work with structured datasets, such as CSV files. For example, it can handle datasets used in machine learning competitions like Kaggle or any tabular data source.

Challenges:

  • Complexity in Generalization: Making the tools generic enough to work with diverse datasets while maintaining simplicity.
  • Performance Optimization: Balancing ease of use with computational efficiency.
  • Error Handling: Ensuring clear and helpful error messages for data validation and model failures.

Conclusion:

This package is a modular and flexible solution for streamlining data science workflows. It provides data scientists and ML engineers with reusable tools to focus on solving domain-specific problems.

[0.1.1] - 2024-11-25

Added

  • Changes in loader function: upload files using filenames.

[0.2.1] - 2024-11-26

  • Added support for Google Cloud Storage.
  • Improved deployment pipeline.
  • Bug fixes and performance improvements.

[0.2.2] - 2024-11-27

  • Bug fixes.

[0.2.3] - 2024-11-28

  • Enhanced Explainability and Error Analysis Users can now save plots generated by the Explainability and ErrorAnalysis classes to files. The directory and filenames are customizable, and plots are automatically overwritten if files with the same name already exist.
  • Customizable Hyperparameter Tuning Users can now fully customize the hyperparameter tuning process: Define the search space for hyperparameters. Specify the optimization algorithm and objective function. Tailor the evaluation process to their needs.
  • FeatureImportanceLayer Enhancements Plots for baseline and final model feature importances can now be saved directly to disk. Customizable output directory (output_dir) and file names. Plots overwrite existing files with the same name.

[0.2.4] - 2024-11-29

  • Bug fixes.

[1.2.4] - 2024-11-29

  • Cloud Storage Integration
  • The data_handling.py and feature_extraction.py scripts now support loading .csv files from GCS paths. Outputs are saved to a user-specified GCS directory via the --outdir parameter.

[1.2.5] - 2024-11-29

  • Bug fixes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

future_sales_prediction_2024-1.2.6.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

future_sales_prediction_2024-1.2.6-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file future_sales_prediction_2024-1.2.6.tar.gz.

File metadata

File hashes

Hashes for future_sales_prediction_2024-1.2.6.tar.gz
Algorithm Hash digest
SHA256 4992f05d9e669415753a7360c3cb85217fe62d7422cc97be0619d2717a1b98b4
MD5 2a19a2de627ccf116e22b2ebb9695afc
BLAKE2b-256 4c6651eb45bd96ab38fe0a12c442ed51f3a3fbe2a6e252238d1e08f5529af1ef

See more details on using hashes here.

File details

Details for the file future_sales_prediction_2024-1.2.6-py3-none-any.whl.

File metadata

File hashes

Hashes for future_sales_prediction_2024-1.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 c11fc2317173f6e1bd929e259a4e4f059da3dfe9d76ac5b141c1bed364f579f5
MD5 54db313d1c7fdae5ef06a31d7ffdc1c2
BLAKE2b-256 f5fe5b6d52d94de1305a804f6425a392f45f7189784bb372a163c3357debcf14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page