A powerful and easy-to-use data processing library
Project description
# DataRoom
DataRoom is a powerful and easy-to-use Python library for data processing, cleaning, analysis, machine learning, and optimization. It provides an intuitive API with well-structured classes and functions, making it simple to work with data in various formats, including CSV, JSON, Excel, databases, and APIs.
## Features
- **Data Ingestion**: Load data from multiple sources, including CSV, JSON, Excel, databases, and APIs.
- **Data Cleaning**: Handle missing values, normalize data, encode categorical variables, and detect outliers.
- **Data Exploration**: Generate descriptive statistics, correlation matrices, and interactive plots.
- **Data Pipelines**: Automate data transformation and preprocessing.
- **Machine Learning Integration**: Train and evaluate classification and regression models.
- **Optimization**: Parallel processing and memory optimization for large datasets.
---
## Installation
You can install DataRoom using pip:
```bash
pip install dataroom
Usage
1. Data Ingestion
Load Data from Different Sources
from dataroom import DataIngestor
ingestor = DataIngestor()
df_csv = ingestor.from_csv("data.csv")
df_json = ingestor.from_json("data.json")
df_excel = ingestor.from_excel("data.xlsx")
df_sql = ingestor.from_sql("sqlite:///database.db", "SELECT * FROM users")
df_api = ingestor.from_api("https://api.example.com/data")
print(df_csv.head())
2. Data Cleaning
Handle Missing Values
from dataroom import DataCleaner
cleaner = DataCleaner()
df_clean = cleaner.handle_missing(df_csv, strategy="mean") # Fill missing values with column mean
print(df_clean.head())
Encode Categorical Data
df_encoded = cleaner.encode(df_clean, encoding_type="onehot")
print(df_encoded.head())
Detect and Remove Outliers
df_no_outliers = cleaner.detect_outliers(df_encoded, method="iqr")
print(df_no_outliers.head())
3. Data Exploration
Generate Summary Statistics
from dataroom import DataExplorer
explorer = DataExplorer()
print(explorer.describe(df_no_outliers))
Plot Data
explorer.plot(df_no_outliers, kind="hist") # Histogram
Generate a Data Profile Report
profile_report = explorer.profile(df_no_outliers)
print(profile_report)
4. Data Pipeline
Automate Data Processing
from dataroom import DataPipeline
pipeline = DataPipeline()
pipeline.add_step(cleaner.normalize, method="minmax")
pipeline.add_step(lambda data: cleaner.handle_missing(data, strategy="median"))
df_pipeline = pipeline.run(df_no_outliers)
print(df_pipeline.head())
5. Machine Learning Integration
Train a Classification Model
from dataroom import DataML
ml_module = DataML()
df_pipeline["target"] = [0, 1, 0, 1, 0, 1]
model, score = ml_module.train_model(df_pipeline, target="target", model_type="classification")
print("Model Accuracy:", score)
Make Predictions
predictions = ml_module.predict(model, df_pipeline.drop(columns=["target"]))
print(predictions)
Auto Feature Selection
selected_features = ml_module.auto_feature_selection(df_pipeline, target="target", method="correlation")
print("Selected Features:", selected_features)
6. Optimization
Parallel Processing
from dataroom import DataOptimizer
optimizer = DataOptimizer()
def process_function(x):
return x * 2
parallel_result = optimizer.parallel_process(process_function, [1, 2, 3, 4, 5])
print(parallel_result)
Optimize Memory Usage
df_optimized = optimizer.optimize_memory(df_pipeline)
print(df_optimized.info())
Coded By Mohammad Taha Gorji
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dataroom-1.0.0.tar.gz
(4.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataroom-1.0.0.tar.gz.
File metadata
- Download URL: dataroom-1.0.0.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50b82af725e73f1cc28144ce337e65db6dac73b6b8f97e54f67dddb7c3c1ca1f
|
|
| MD5 |
a71ff9f9cffc686837a093f2a5314e50
|
|
| BLAKE2b-256 |
e7513367190445b9a568a68bbc2317af017bbba37f8110c24c1e800cd2b44fef
|
File details
Details for the file dataroom-1.0.0-py3-none-any.whl.
File metadata
- Download URL: dataroom-1.0.0-py3-none-any.whl
- Upload date:
- Size: 4.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c220263de1fd6fea31bdd54764ae374d64304eef3257f773700c6e657301a33
|
|
| MD5 |
c8d4e621bae077c41f3efd5c1109ba78
|
|
| BLAKE2b-256 |
26893806b180570e3a17c6f4c9b0d34db26d00b5ec43586f9f29ff698e2107b3
|