Exploratory data analysis and presentation tool
Project description
Data Oriented Report Automator (DORA)
Overview
Welcome to DORA! This isn't just a script; it's an intelligent EDA assistant. DORA empowers you to move from a raw dataset to a comprehensive HTML report with minimal effort. It's designed to be powerful and configurable, yet simple enough for anyone to use thanks to its interactive mode.
Key Features
- Dual-Mode Operation:
- Interactive mode: A step-by-step wizard to configure your analysis (no coding involved)
- Configuration driven: For reproducible workflows, define your analysis in a
config.yamlfile
- Flexible Data Input: Supports CSV, Excel, JSON, and Parquet files.
- Note: For Excel files, DORA will only read and analyze the first sheet.
- Data Profiling: Get an overview of your dataset's health, including missing values, descriptive statistics and data types
- Target-centric analysis: Generates plots that explore the relationship between your features and a specified target variable
- HTML Reports: Generates a HTML report that's easy to share and view in any browser.
User Guide
Get started with DORA in just two commands. Ensure you have Poetry installed first.
Step 1
# For Windows (Powershell)
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -
# For Linux or MacOS (Terminal)
curl -sSL https://install.python-poetry.org | python3 -
Step 2
# Clone the repository
git clone https://github.com/Asifdotexe/DORA.git
cd DORA
# Install all dependencies using Poetry
poetry install
# The --with dev flag is important as it also installs development tools like pylint. (for developers only)
poetry install --with dev
Quick Start (Interactive Mode)
This is the easiest way to run DORA. The interactive wizard will guide you through the entire process.
cd src/dora
poetry run python main.py
You will be prompted to:
- Enter the path to your CSV file.
- Specify an output directory.
- (Optionally) select a target variable.
- Choose which analysis steps to perform.
At the end, it will even ask if you want to save your choices to a
config.yamlfile for next time!
Advanced Usage (Config-Driven Mode)
For reproducible results or to integrate DORA into a larger workflow, the configuration-driven mode is ideal.
a. Create a config.yaml file:
# --- Input/Output Settings ---
input_file: 'data/insurance.csv'
output_dir: 'output/insurance_report'
report_title: 'Exploratory Data Analysis of Insurance Premiums'
# --- Dataset Settings ---
target_variable: 'charges'
# --- Analysis Pipeline ---
# Define the steps to run. The tool will execute them in this order.
analysis_pipeline:
- profile:
# Generate detailed data profile (missing values, cardinality, stats).
# No extra parameters needed.
enabled: true
- univariate:
# Generate plots for individual columns.
enabled: true
plot_types:
# Can be 'histogram', 'boxplot'
numerical: ['histogram', 'boxplot']
# Can be 'barplot'
categorical: ['barplot']
- bivariate:
# Analyze relationships between two variables.
enabled: true
# If true, focuses on plotting features against the target_variable.
# If false, would require more specific pairs to be defined (more advanced).
target_centric: true
- multivariate:
# Analyze relationships among three or more variables.
enabled: true
# Specify columns for the correlation heatmap.
# If empty or not provided, uses all numerical columns.
correlation_cols: ['age', 'bmi', 'children', 'charges']
b. Run DORA with the config file:
cd src/dora
poetry run python main.py --config config.yaml
Viewing the Output
After the analysis is complete, you will find a new folder at your specified output path containing:
- eda_report.html: Your final, shareable report. Open it in any browser.
- charts/: A sub-folder with all the generated plots saved as individual image files.
Developer Onboarding
Interested in contributing to DORA? Awesome! Here’s how to get set up.
1. Setting Up the Development Environment
The poetry install command you ran earlier for developers already installed all the development dependencies (like pytest and pylint).
2. Running Linters and Formatters
We use black for formatting, isort for sorting imports, and pylint for linting. We recommend setting up pre-commit hooks to automate this process.
# Install the pre-commit hooks (run this once)
poetry run pre-commit install
# Now, your code will be automatically checked and formatted every time you make a commit
To run the checks manually:
# Format code with Black and isort
poetry run black .
poetry run isort .
# Run the linter
poetry run pylint src/dora
3. How to Contribute
- Fork the repository.
- Create a new branch (git checkout -b feature/my-new-feature).
- Make your changes and add tests for them.
- Ensure all tests and pre-commit checks pass.
- Push to your branch and submit a Pull Request.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Happy analyzing with DORA! 🎉
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dora_eda-3.0.0.tar.gz.
File metadata
- Download URL: dora_eda-3.0.0.tar.gz
- Upload date:
- Size: 19.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.13.9 Windows/11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
306c9982fe21cbe1e31a37534f4bc984b4b91fe4fa0abe423505901b4f488a72
|
|
| MD5 |
5935e8c93370a00b3b5366356ed78b05
|
|
| BLAKE2b-256 |
a88034f5ab0029ef80a6a5f1664bc941844ff2dfcbbe761bd1ccead656c206d9
|
File details
Details for the file dora_eda-3.0.0-py3-none-any.whl.
File metadata
- Download URL: dora_eda-3.0.0-py3-none-any.whl
- Upload date:
- Size: 21.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.13.9 Windows/11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63e0944b7436a88b0ba6b8a18619ab9c72978041654097a406758afdbc35d7d1
|
|
| MD5 |
5b7dc7b4b2275a642f5368efe9c53960
|
|
| BLAKE2b-256 |
e955a9ba33805021696e02a1dc354cb8e5e26e17bc51c4720f1c921d58e96178
|