Skip to main content

Various code to aid in data science projects for tasks involving data cleaning, ETL, EDA, NLP, viz, feature engineering, feature selection, model validation, etc.

Project description

data-science-toolbox

Various code to aid in data science projects for tasks involving data cleaning, ETL, EDA, NLP, viz, feature engineering, feature selection, model training and validation etc.

Project Organization


├── README.md              
├── data_science_toolbox   <- Project source code
│   │
│   ├── gists                  <- Code gists with commonly used code (change to root
│   │                             directory, connect to database, profile data, etc)
│   ├── io                     <- Code for input/output utilities
│   ├── etl                    <- For building reproducible ETL pipelines, including data
│   │                             checks and transformers
│   ├── ml                     <- Machine Learning utility code (feature engineering, etc) 
│   ├── pandas                 <- Pandas related utility code
│   │   ├── analysis                  
│   │   ├── cleaning
│   │   ├── engineering
│   │   ├── text    
│   │   ├── datetime     
│   │   ├── optimization       
│   │   └── profiling   
│   ├── project_utils.py   <- For project specific utilities
│   │
│   ├── text               <- Code for dealing with text. Includes distributed loading of text corpus, 
│   │                         entity statement extraction, sentiment analysis, pii removal etc.
│   └── __init__.py        <- Makes data_science_toolbox a Python module               
├── tests
├── LICENSE
├── poetry.lock
└── pyproject.toml 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_science_toolbox-0.1.2.tar.gz (66.3 kB view hashes)

Uploaded Source

Built Distribution

data_science_toolbox-0.1.2-py3-none-any.whl (108.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page