Skip to main content

Exploratory data analysis and presentation tool

Project description

Data Oriented Report Automator (DORA)

DORA Logo

An interactive command-line tool to automate Exploratory Data Analysis (EDA) and generate beautiful, insightful reports in seconds.

Overview

Welcome to DORA! This isn't just a script; it's an intelligent EDA assistant. DORA empowers you to move from a raw dataset to a comprehensive HTML report with minimal effort. It's designed to be powerful and configurable, yet simple enough for anyone to use thanks to its interactive mode.

Key Features

  • Dual-Mode Operation:
    • Interactive mode: A step-by-step wizard to configure your analysis (no coding involved)
    • Configuration driven: For reproducible workflows, define your analysis in a config.yaml file
  • Flexible Data Input: Supports CSV, Excel, JSON, and Parquet files.
    • Note: For Excel files, DORA will only read and analyze the first sheet.
  • Data Profiling: Get an overview of your dataset's health, including missing values, descriptive statistics and data types
  • Target-centric analysis: Generates plots that explore the relationship between your features and a specified target variable
  • HTML Reports: Generates a HTML report that's easy to share and view in any browser.

User Guide

Get started with DORA in just two commands. Ensure you have Poetry installed first.

Step 1

# For Windows (Powershell)
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -

# For Linux or MacOS (Terminal)
curl -sSL https://install.python-poetry.org | python3 -

Step 2

# Clone the repository
git clone https://github.com/Asifdotexe/DORA.git
cd DORA

# Install all dependencies using Poetry
poetry install

# The --with dev flag is important as it also installs development tools like pylint. (for developers only)
poetry install --with dev

Quick Start (Interactive Mode)

This is the easiest way to run DORA. The interactive wizard will guide you through the entire process.

cd src/dora
poetry run python main.py

You will be prompted to:

  • Enter the path to your CSV file.
  • Specify an output directory.
  • (Optionally) select a target variable.
  • Choose which analysis steps to perform. At the end, it will even ask if you want to save your choices to a config.yaml file for next time!

Advanced Usage (Config-Driven Mode)

For reproducible results or to integrate DORA into a larger workflow, the configuration-driven mode is ideal.

a. Create a config.yaml file:

# --- Input/Output Settings ---
input_file: 'data/insurance.csv'
output_dir: 'output/insurance_report'
report_title: 'Exploratory Data Analysis of Insurance Premiums'

# --- Dataset Settings ---
target_variable: 'charges'

# --- Analysis Pipeline ---
# Define the steps to run. The tool will execute them in this order.
analysis_pipeline:
  - profile:
      # Generate detailed data profile (missing values, cardinality, stats).
      # No extra parameters needed.
      enabled: true

  - univariate:
      # Generate plots for individual columns.
      enabled: true
      plot_types:
        # Can be 'histogram', 'boxplot'
        numerical: ['histogram', 'boxplot']
        # Can be 'barplot'
        categorical: ['barplot']

  - bivariate:
      # Analyze relationships between two variables.
      enabled: true
      # If true, focuses on plotting features against the target_variable.
      # If false, would require more specific pairs to be defined (more advanced).
      target_centric: true

  - multivariate:
      # Analyze relationships among three or more variables.
      enabled: true
      # Specify columns for the correlation heatmap.
      # If empty or not provided, uses all numerical columns.
      correlation_cols: ['age', 'bmi', 'children', 'charges']

b. Run DORA with the config file:

cd src/dora
poetry run python main.py --config config.yaml

Viewing the Output

After the analysis is complete, you will find a new folder at your specified output path containing:

  • eda_report.html: Your final, shareable report. Open it in any browser.
  • charts/: A sub-folder with all the generated plots saved as individual image files.

Developer Onboarding

Interested in contributing to DORA? Awesome! Here’s how to get set up.

1. Setting Up the Development Environment

The poetry install command you ran earlier for developers already installed all the development dependencies (like pytest and pylint).

2. Running Linters and Formatters

We use black for formatting, isort for sorting imports, and pylint for linting. We recommend setting up pre-commit hooks to automate this process.

# Install the pre-commit hooks (run this once)
poetry run pre-commit install

# Now, your code will be automatically checked and formatted every time you make a commit

To run the checks manually:

# Format code with Black and isort
poetry run black .
poetry run isort .

# Run the linter
poetry run pylint src/dora

3. How to Contribute

  • Fork the repository.
  • Create a new branch (git checkout -b feature/my-new-feature).
  • Make your changes and add tests for them.
  • Ensure all tests and pre-commit checks pass.
  • Push to your branch and submit a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Happy analyzing with DORA! 🎉

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dora_eda-3.0.0.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dora_eda-3.0.0-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file dora_eda-3.0.0.tar.gz.

File metadata

  • Download URL: dora_eda-3.0.0.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.9 Windows/11

File hashes

Hashes for dora_eda-3.0.0.tar.gz
Algorithm Hash digest
SHA256 306c9982fe21cbe1e31a37534f4bc984b4b91fe4fa0abe423505901b4f488a72
MD5 5935e8c93370a00b3b5366356ed78b05
BLAKE2b-256 a88034f5ab0029ef80a6a5f1664bc941844ff2dfcbbe761bd1ccead656c206d9

See more details on using hashes here.

File details

Details for the file dora_eda-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: dora_eda-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.9 Windows/11

File hashes

Hashes for dora_eda-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 63e0944b7436a88b0ba6b8a18619ab9c72978041654097a406758afdbc35d7d1
MD5 5b7dc7b4b2275a642f5368efe9c53960
BLAKE2b-256 e955a9ba33805021696e02a1dc354cb8e5e26e17bc51c4720f1c921d58e96178

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page