Useful tools to help you document and describe data processing and modelling
Project description
datascribe: useful tools to help you document and describe data processing and modelling
datascribe has been developed to assist data scientists in documenting and describing data analysis, processing and modelling.
Vision for datascribe
-
Assist users in documenting the dataset they are using.
-
Assist users in documenting the preprocessing and subsequent analysis they perform on datasets.
-
Provide summary text, images and tables in an editable format (markdown and Microsoft Word) tool.
-
Help users document research and analyses well so that it is transparent and repeatable.
Features:
-
Creation of basic summary paragraphs with supporting markdown tables or table images to describe initial dataset information.
-
Implementation of key feature engineering tools from
sklearn
in tandem with producing a log of steps taken to process and analyse the dataset. -
Ability to produce a workflow diagram via
graphviz
to help visualise the data processing and modelling workflow.
How to explore datascribe
Installing the virtual environment
Details of the conda virtual environment are available here: binder/environment.yml
Open the repo in a terminal (Mac/Linux) or anaconda prompt (Windows)
Navigate to the correct directory.
Create the environment with the following command:
conda env create -f binder/environment.yml
Activate the environment with the following command:
conda activate datascribe
It is strongly recommended that you install a conda environment to avoid dependency conflicts.
Dependencies
This project relies on the following external dependencies:
- Graphviz: Used for creating visualizations in the workflow.
Installing Dependencies
Graphviz
Make sure you have Graphviz installed on your system. You can download it from the official Graphviz website or install it using your package manager:
Linux (e.g., Ubuntu)
sudo apt-get install graphviz
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datascribe-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2ec628282214bdd8de3ee5aaa22523bf5fe6aa33444df52a4c213d7ac0fbe09 |
|
MD5 | 351b73ccdad821b07391447fc6b45956 |
|
BLAKE2b-256 | 276c196758ae519cd6570623ceb60917c999372f171a34196e2f04e6c1534ced |