Skip to main content

A powerful and easy-to-use data processing library

Project description

# DataRoom

DataRoom is a powerful and easy-to-use Python library for data processing, cleaning, analysis, machine learning, and optimization. It provides an intuitive API with well-structured classes and functions, making it simple to work with data in various formats, including CSV, JSON, Excel, databases, and APIs.

## Features

- **Data Ingestion**: Load data from multiple sources, including CSV, JSON, Excel, databases, and APIs.
- **Data Cleaning**: Handle missing values, normalize data, encode categorical variables, and detect outliers.
- **Data Exploration**: Generate descriptive statistics, correlation matrices, and interactive plots.
- **Data Pipelines**: Automate data transformation and preprocessing.
- **Machine Learning Integration**: Train and evaluate classification and regression models.
- **Optimization**: Parallel processing and memory optimization for large datasets.

---

## Installation

You can install DataRoom using pip:

```bash
pip install dataroom

Usage

1. Data Ingestion

Load Data from Different Sources

from dataroom import DataIngestor

ingestor = DataIngestor()
df_csv = ingestor.from_csv("data.csv")
df_json = ingestor.from_json("data.json")
df_excel = ingestor.from_excel("data.xlsx")
df_sql = ingestor.from_sql("sqlite:///database.db", "SELECT * FROM users")
df_api = ingestor.from_api("https://api.example.com/data")

print(df_csv.head())

2. Data Cleaning

Handle Missing Values

from dataroom import DataCleaner

cleaner = DataCleaner()
df_clean = cleaner.handle_missing(df_csv, strategy="mean")  # Fill missing values with column mean
print(df_clean.head())

Encode Categorical Data

df_encoded = cleaner.encode(df_clean, encoding_type="onehot")
print(df_encoded.head())

Detect and Remove Outliers

df_no_outliers = cleaner.detect_outliers(df_encoded, method="iqr")
print(df_no_outliers.head())

3. Data Exploration

Generate Summary Statistics

from dataroom import DataExplorer

explorer = DataExplorer()
print(explorer.describe(df_no_outliers))

Plot Data

explorer.plot(df_no_outliers, kind="hist")  # Histogram

Generate a Data Profile Report

profile_report = explorer.profile(df_no_outliers)
print(profile_report)

4. Data Pipeline

Automate Data Processing

from dataroom import DataPipeline

pipeline = DataPipeline()
pipeline.add_step(cleaner.normalize, method="minmax")
pipeline.add_step(lambda data: cleaner.handle_missing(data, strategy="median"))

df_pipeline = pipeline.run(df_no_outliers)
print(df_pipeline.head())

5. Machine Learning Integration

Train a Classification Model

from dataroom import DataML

ml_module = DataML()
df_pipeline["target"] = [0, 1, 0, 1, 0, 1]

model, score = ml_module.train_model(df_pipeline, target="target", model_type="classification")
print("Model Accuracy:", score)

Make Predictions

predictions = ml_module.predict(model, df_pipeline.drop(columns=["target"]))
print(predictions)

Auto Feature Selection

selected_features = ml_module.auto_feature_selection(df_pipeline, target="target", method="correlation")
print("Selected Features:", selected_features)

6. Optimization

Parallel Processing

from dataroom import DataOptimizer

optimizer = DataOptimizer()

def process_function(x):
    return x * 2

parallel_result = optimizer.parallel_process(process_function, [1, 2, 3, 4, 5])
print(parallel_result)

Optimize Memory Usage

df_optimized = optimizer.optimize_memory(df_pipeline)
print(df_optimized.info())

Coded By Mohammad Taha Gorji

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataroom-1.0.1.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataroom-1.0.1-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file dataroom-1.0.1.tar.gz.

File metadata

  • Download URL: dataroom-1.0.1.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for dataroom-1.0.1.tar.gz
Algorithm Hash digest
SHA256 513d7fb0d2bddff8ceecb60a5cdda61324a6d099ae0f3c50393f6bd1b4a30e8f
MD5 d932241cc625c786a44d195f752da9a5
BLAKE2b-256 519304e8bd75731bf0659d69df49ebd08432f31e9a071c6b02218de746e11429

See more details on using hashes here.

File details

Details for the file dataroom-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: dataroom-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for dataroom-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9cae656b9d040d3069f246dc746bbfc6ee75f8652c46267671a1a0e0222a1ae3
MD5 5393b44de1ce5204cc52edab102f0636
BLAKE2b-256 863dc4ab1cdf6ec881236f1ac711aef72c318a806dcf0867e1444f1bc905ebcd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page