Skip to main content

A powerful and easy-to-use data processing library

Project description

# DataRoom

DataRoom is a powerful and easy-to-use Python library for data processing, cleaning, analysis, machine learning, and optimization. It provides an intuitive API with well-structured classes and functions, making it simple to work with data in various formats, including CSV, JSON, Excel, databases, and APIs.

## Features

- **Data Ingestion**: Load data from multiple sources, including CSV, JSON, Excel, databases, and APIs.
- **Data Cleaning**: Handle missing values, normalize data, encode categorical variables, and detect outliers.
- **Data Exploration**: Generate descriptive statistics, correlation matrices, and interactive plots.
- **Data Pipelines**: Automate data transformation and preprocessing.
- **Machine Learning Integration**: Train and evaluate classification and regression models.
- **Optimization**: Parallel processing and memory optimization for large datasets.

---

## Installation

You can install DataRoom using pip:

```bash
pip install dataroom

Usage

1. Data Ingestion

Load Data from Different Sources

from dataroom import DataIngestor

ingestor = DataIngestor()
df_csv = ingestor.from_csv("data.csv")
df_json = ingestor.from_json("data.json")
df_excel = ingestor.from_excel("data.xlsx")
df_sql = ingestor.from_sql("sqlite:///database.db", "SELECT * FROM users")
df_api = ingestor.from_api("https://api.example.com/data")

print(df_csv.head())

2. Data Cleaning

Handle Missing Values

from dataroom import DataCleaner

cleaner = DataCleaner()
df_clean = cleaner.handle_missing(df_csv, strategy="mean")  # Fill missing values with column mean
print(df_clean.head())

Encode Categorical Data

df_encoded = cleaner.encode(df_clean, encoding_type="onehot")
print(df_encoded.head())

Detect and Remove Outliers

df_no_outliers = cleaner.detect_outliers(df_encoded, method="iqr")
print(df_no_outliers.head())

3. Data Exploration

Generate Summary Statistics

from dataroom import DataExplorer

explorer = DataExplorer()
print(explorer.describe(df_no_outliers))

Plot Data

explorer.plot(df_no_outliers, kind="hist")  # Histogram

Generate a Data Profile Report

profile_report = explorer.profile(df_no_outliers)
print(profile_report)

4. Data Pipeline

Automate Data Processing

from dataroom import DataPipeline

pipeline = DataPipeline()
pipeline.add_step(cleaner.normalize, method="minmax")
pipeline.add_step(lambda data: cleaner.handle_missing(data, strategy="median"))

df_pipeline = pipeline.run(df_no_outliers)
print(df_pipeline.head())

5. Machine Learning Integration

Train a Classification Model

from dataroom import DataML

ml_module = DataML()
df_pipeline["target"] = [0, 1, 0, 1, 0, 1]

model, score = ml_module.train_model(df_pipeline, target="target", model_type="classification")
print("Model Accuracy:", score)

Make Predictions

predictions = ml_module.predict(model, df_pipeline.drop(columns=["target"]))
print(predictions)

Auto Feature Selection

selected_features = ml_module.auto_feature_selection(df_pipeline, target="target", method="correlation")
print("Selected Features:", selected_features)

6. Optimization

Parallel Processing

from dataroom import DataOptimizer

optimizer = DataOptimizer()

def process_function(x):
    return x * 2

parallel_result = optimizer.parallel_process(process_function, [1, 2, 3, 4, 5])
print(parallel_result)

Optimize Memory Usage

df_optimized = optimizer.optimize_memory(df_pipeline)
print(df_optimized.info())

Coded By Mohammad Taha Gorji

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataroom-1.0.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataroom-1.0.0-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file dataroom-1.0.0.tar.gz.

File metadata

  • Download URL: dataroom-1.0.0.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for dataroom-1.0.0.tar.gz
Algorithm Hash digest
SHA256 50b82af725e73f1cc28144ce337e65db6dac73b6b8f97e54f67dddb7c3c1ca1f
MD5 a71ff9f9cffc686837a093f2a5314e50
BLAKE2b-256 e7513367190445b9a568a68bbc2317af017bbba37f8110c24c1e800cd2b44fef

See more details on using hashes here.

File details

Details for the file dataroom-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: dataroom-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for dataroom-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3c220263de1fd6fea31bdd54764ae374d64304eef3257f773700c6e657301a33
MD5 c8d4e621bae077c41f3efd5c1109ba78
BLAKE2b-256 26893806b180570e3a17c6f4c9b0d34db26d00b5ec43586f9f29ff698e2107b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page