dataroom

A powerful and easy-to-use data processing library

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

# DataRoom

DataRoom is a powerful and easy-to-use Python library for data processing, cleaning, analysis, machine learning, and optimization. It provides an intuitive API with well-structured classes and functions, making it simple to work with data in various formats, including CSV, JSON, Excel, databases, and APIs.

## Features

- **Data Ingestion**: Load data from multiple sources, including CSV, JSON, Excel, databases, and APIs.
- **Data Cleaning**: Handle missing values, normalize data, encode categorical variables, and detect outliers.
- **Data Exploration**: Generate descriptive statistics, correlation matrices, and interactive plots.
- **Data Pipelines**: Automate data transformation and preprocessing.
- **Machine Learning Integration**: Train and evaluate classification and regression models.
- **Optimization**: Parallel processing and memory optimization for large datasets.

---

## Installation

You can install DataRoom using pip:

```bash
pip install dataroom

Usage

1. Data Ingestion

Load Data from Different Sources

from dataroom import DataIngestor

ingestor = DataIngestor()
df_csv = ingestor.from_csv("data.csv")
df_json = ingestor.from_json("data.json")
df_excel = ingestor.from_excel("data.xlsx")
df_sql = ingestor.from_sql("sqlite:///database.db", "SELECT * FROM users")
df_api = ingestor.from_api("https://api.example.com/data")

print(df_csv.head())

2. Data Cleaning

Handle Missing Values

from dataroom import DataCleaner

cleaner = DataCleaner()
df_clean = cleaner.handle_missing(df_csv, strategy="mean")  # Fill missing values with column mean
print(df_clean.head())

Encode Categorical Data

df_encoded = cleaner.encode(df_clean, encoding_type="onehot")
print(df_encoded.head())

Detect and Remove Outliers

df_no_outliers = cleaner.detect_outliers(df_encoded, method="iqr")
print(df_no_outliers.head())

3. Data Exploration

Generate Summary Statistics

from dataroom import DataExplorer

explorer = DataExplorer()
print(explorer.describe(df_no_outliers))

Plot Data

explorer.plot(df_no_outliers, kind="hist")  # Histogram

Generate a Data Profile Report

profile_report = explorer.profile(df_no_outliers)
print(profile_report)

4. Data Pipeline

Automate Data Processing

from dataroom import DataPipeline

pipeline = DataPipeline()
pipeline.add_step(cleaner.normalize, method="minmax")
pipeline.add_step(lambda data: cleaner.handle_missing(data, strategy="median"))

df_pipeline = pipeline.run(df_no_outliers)
print(df_pipeline.head())

5. Machine Learning Integration

Train a Classification Model

from dataroom import DataML

ml_module = DataML()
df_pipeline["target"] = [0, 1, 0, 1, 0, 1]

model, score = ml_module.train_model(df_pipeline, target="target", model_type="classification")
print("Model Accuracy:", score)

Make Predictions

predictions = ml_module.predict(model, df_pipeline.drop(columns=["target"]))
print(predictions)

Auto Feature Selection

selected_features = ml_module.auto_feature_selection(df_pipeline, target="target", method="correlation")
print("Selected Features:", selected_features)

6. Optimization

Parallel Processing

from dataroom import DataOptimizer

optimizer = DataOptimizer()

def process_function(x):
    return x * 2

parallel_result = optimizer.parallel_process(process_function, [1, 2, 3, 4, 5])
print(parallel_result)

Optimize Memory Usage

df_optimized = optimizer.optimize_memory(df_pipeline)
print(df_optimized.info())

Coded By Mohammad Taha Gorji

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.0.1

Mar 20, 2025

This version

1.0.0

Mar 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataroom-1.0.0.tar.gz (4.6 kB view details)

Uploaded Mar 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataroom-1.0.0-py3-none-any.whl (4.8 kB view details)

Uploaded Mar 20, 2025 Python 3

File details

Details for the file dataroom-1.0.0.tar.gz.

File metadata

Download URL: dataroom-1.0.0.tar.gz
Upload date: Mar 20, 2025
Size: 4.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for dataroom-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`50b82af725e73f1cc28144ce337e65db6dac73b6b8f97e54f67dddb7c3c1ca1f`
MD5	`a71ff9f9cffc686837a093f2a5314e50`
BLAKE2b-256	`e7513367190445b9a568a68bbc2317af017bbba37f8110c24c1e800cd2b44fef`

See more details on using hashes here.

File details

Details for the file dataroom-1.0.0-py3-none-any.whl.

File metadata

Download URL: dataroom-1.0.0-py3-none-any.whl
Upload date: Mar 20, 2025
Size: 4.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for dataroom-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c220263de1fd6fea31bdd54764ae374d64304eef3257f773700c6e657301a33`
MD5	`c8d4e621bae077c41f3efd5c1109ba78`
BLAKE2b-256	`26893806b180570e3a17c6f4c9b0d34db26d00b5ec43586f9f29ff698e2107b3`

See more details on using hashes here.

dataroom 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Usage

1. Data Ingestion

Load Data from Different Sources

2. Data Cleaning

Handle Missing Values

Encode Categorical Data

Detect and Remove Outliers

3. Data Exploration

Generate Summary Statistics

Plot Data

Generate a Data Profile Report

4. Data Pipeline

Automate Data Processing

5. Machine Learning Integration

Train a Classification Model

Make Predictions

Auto Feature Selection

6. Optimization

Parallel Processing

Optimize Memory Usage

Coded By Mohammad Taha Gorji

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes