Hossam Data Helper
Project description
๐ Hossam Data Helper
Hossam์ ๋ฐ์ดํฐ ๋ถ์, ์๊ฐํ, ํต๊ณ ์ฒ๋ฆฌ๋ฅผ ์ํ ์ข ํฉ ํฌํผ ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ ๋๋ค.
์์ดํฐ์(ITWILL)์์ ์งํ ์ค์ธ ๋จธ์ ๋ฌ๋ ๋ฐ ๋ฐ์ดํฐ ๋ถ์ ์์ ์ ์ํด ๊ฐ๋ฐ๋์์ผ๋ฉฐ, ์ด๊ดํธ ๊ฐ์ฌ์ ๊ฐ์์์ ํ์ฉ๋ฉ๋๋ค.
๐ ๋ชฉ์ฐจ
โจ ํน์ง
- ๐ ํ๋ถํ ์๊ฐํ: Seaborn/Matplotlib ๊ธฐ๋ฐ์ 25+ ์๊ฐํ ํจ์
- ๐ฏ ํต๊ณ ๋ถ์: ํ๊ท, ๋ถ๋ฅ, ์๊ณ์ด ๋ถ์์ ์ํ ํต๊ณ ๋๊ตฌ
- ๐ฆ ์ํ ๋ฐ์ดํฐ: ํ์ต์ฉ ๋ฐ์ดํฐ์ ์ฆ์ ๋ก๋ ๊ธฐ๋ฅ
- ๐ง ๋ฐ์ดํฐ ์ ์ฒ๋ฆฌ: ๊ฒฐ์ธก์น ์ฒ๋ฆฌ, ์ด์์น ํ์ง, ์ค์ผ์ผ๋ง ๋ฑ
- ๐ ๊ฐํธํ ์ฌ์ฉ: ์ง๊ด์ ์ธ API๋ก ๋น ๋ฅธ ํ๋กํ ํ์ดํ ์ง์
- ๐ ๊ต์ก์ฉ ์ต์ ํ: ๋ฐ์ดํฐ ๋ถ์ ๊ต์ก์ ํนํ๋ ์ค๊ณ
๐ฆ ์ค์น
PyPI๋ฅผ ํตํ ์ค์น (๊ถ์ฅ)
pip install hossam
๊ฐ๋ฐ ๋ฒ์ ์ค์น
git clone https://github.com/leekh4232/hossam-data.git
cd hossam-data
pip install -e .
์๊ตฌ์ฌํญ
- Python 3.8 ์ด์
- pandas, numpy, matplotlib, seaborn ๋ฑ (์๋ ์ค์น๋จ)
๐ ๋น ๋ฅธ ์์
๋ฒ์ ํ์ธ
import hossam
print(hossam.__version__) # 0.3.0
์ํ ๋ฐ์ดํฐ ๋ก๋
from hossam import load_data, load_info
# ์ฌ์ฉ ๊ฐ๋ฅํ ๋ฐ์ดํฐ์
๋ชฉ๋ก ํ์ธ
datasets = load_info()
print(datasets)
# ํน์ ํค์๋๋ก ๊ฒ์
ad_datasets = load_info(search="AD")
# ๋ฐ์ดํฐ์
๋ก๋
df = load_data('AD_SALES')
print(df.head())
๊ฐ๋จํ ์๊ฐํ
from hossam import plot as hs_plot
import pandas as pd
import numpy as np
# ์ํ ๋ฐ์ดํฐ ์์ฑ
df = pd.DataFrame({
'x': np.random.randn(100),
'y': np.random.randn(100),
'category': np.random.choice(['A', 'B', 'C'], 100)
})
# ์ฐ์ ๋ ๊ทธ๋ฆฌ๊ธฐ
hs_plot.hs_scatterplot(df=df, xname='x', yname='y', hue='category', palette='Set1')
# ๋ฐ์คํ๋กฏ ๊ทธ๋ฆฌ๊ธฐ
hs_plot.hs_boxplot(df=df, xname='category', yname='x', palette='pastel')
# KDE ํ๋กฏ ๊ทธ๋ฆฌ๊ธฐ
hs_plot.hs_kdeplot(df=df, xname='x', hue='category', fill=True, fill_alpha=0.3)
๐ฏ ์ฃผ์ ๊ธฐ๋ฅ
1. ๋ฐ์ดํฐ ๋ก๋
ํ์ต์ฉ ์ํ ๋ฐ์ดํฐ์ ์ ๋น ๋ฅด๊ฒ ๋ก๋ํ ์ ์์ต๋๋ค.
from hossam import load_data, load_info
# ๋ชจ๋ ๋ฐ์ดํฐ์
๋ชฉ๋ก ๋ณด๊ธฐ
all_datasets = load_info()
# ํค์๋๋ก ๊ฒ์
search_results = load_info(search="regression")
# ๋ฐ์ดํฐ ๋ก๋
df = load_data('DATASET_NAME')
์ฃผ์ ๋ฐ์ดํฐ์ (์์):
AD_SALES: ๊ด๊ณ ๋น์ ๋งค์ถ ๋ฐ์ดํฐ- ๊ธฐํ ๋ค์ํ ํ๊ท, ๋ถ๋ฅ, ์๊ณ์ด ๋ฐ์ดํฐ์
2. ์๊ฐํ ๋ชจ๋ (hossam.plot)
๊ธฐ๋ณธ ํ๋กฏ
์ ๊ทธ๋ํ (Line Plot)
from hossam import plot as hs_plot
hs_plot.hs_lineplot(
df=df,
xname='time',
yname='value',
hue='category',
marker='o',
palette='Set1'
)
์ฐ์ ๋ (Scatter Plot)
hs_plot.hs_scatterplot(
df=df,
xname='x',
yname='y',
hue='group',
palette='husl'
)
ํ์คํ ๊ทธ๋จ (Histogram)
hs_plot.hs_histplot(
df=df,
xname='value',
hue='category',
bins=30,
kde=True,
palette='Set2'
)
๋ถํฌ ์๊ฐํ
๋ฐ์คํ๋กฏ (Box Plot)
hs_plot.hs_boxplot(
df=df,
xname='category',
yname='value',
orient='v',
palette='pastel'
)
๋ฐ์ด์ฌ๋ฆฐ ํ๋กฏ (Violin Plot)
hs_plot.hs_violinplot(
df=df,
xname='category',
yname='value',
palette='muted'
)
KDE ํ๋กฏ (Kernel Density Estimation)
# 1์ฐจ์ KDE
hs_plot.hs_kdeplot(
df=df,
xname='value',
hue='category',
fill=True,
fill_alpha=0.3,
palette='Set1'
)
# 2์ฐจ์ KDE
hs_plot.hs_kdeplot(
df=df,
xname='x',
yname='y',
palette='coolwarm'
)
ํต๊ณ์ ํ๋กฏ
ํ๊ท์ ์ด ํฌํจ๋ ์ฐ์ ๋ (Regression Plot)
hs_plot.hs_regplot(
df=df,
xname='x',
yname='y',
palette='red'
)
์ ํ ๋ชจ๋ธ ํ๋กฏ (LM Plot)
hs_plot.hs_lmplot(
df=df,
xname='x',
yname='y',
hue='category'
)
์์ฐจ ํ๋กฏ (Residual Plot)
from sklearn.linear_model import LinearRegression
# ๋ชจ๋ธ ํ์ต
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# ์์ฐจ ํ๋กฏ
hs_plot.hs_residplot(
y=y_test,
y_pred=y_pred,
lowess=True, # LOWESS ํํํ
mse=True # MSE ๋ฒ์ ํ์
)
Q-Q ํ๋กฏ (Quantile-Quantile Plot)
residuals = y_test - y_pred
hs_plot.hs_qqplot(y_pred=residuals)
ํผ๋ ํ๋ ฌ (Confusion Matrix)
hs_plot.hs_confusion_matrix(
y=y_test,
y_pred=y_pred,
cmap='Blues'
)
๋ค๋ณ๋ ๋ถ์
์ ๊ด๊ณ ํ๋กฏ (Pair Plot)
hs_plot.hs_pairplot(
df=df,
diag_kind='kde',
hue='category',
palette='Set1'
)
๊ณต๋ ๋ถํฌ ํ๋กฏ (Joint Plot)
hs_plot.hs_jointplot(
df=df,
xname='x',
yname='y',
palette='viridis'
)
ํํธ๋งต (Heatmap)
# ์๊ด๊ณ์ ํ๋ ฌ
corr_matrix = df.corr()
hs_plot.hs_heatmap(
data=corr_matrix,
palette='coolwarm'
)
๊ณ ๊ธ ์๊ฐํ
๋ณผ๋ก ๊ป์ง ์ฐ์ ๋ (Convex Hull)
hs_plot.hs_convex_hull(
data=df,
xname='x',
yname='y',
hue='cluster',
palette='Set1'
)
100% ๋์ ๋ง๋ ๊ทธ๋ํ (Stacked Bar)
hs_plot.hs_stackplot(
df=df,
xname='category',
hue='subcategory',
palette='Pastel1'
)
P-Value ์ฃผ์ ๋ฐ์คํ๋กฏ
hs_plot.hs_pvalue1_anotation(
data=df,
target='value',
hue='group',
pairs=[('A', 'B'), ('B', 'C')],
test='t-test_ind',
text_format='star'
)
ํด๋์ค๋ณ ๋ถํฌ (Distribution by Class)
hs_plot.hs_distribution_by_class(
data=df,
xnames=['feature1', 'feature2'],
hue='target',
type='kde',
fill=True,
palette='Set1'
)
ํด๋์ค๋ณ ์ฐ์ ๋ (Scatter by Class)
hs_plot.hs_scatter_by_class(
data=df,
group=[['x', 'y'], ['x', 'z']],
hue='target',
outline=True, # ๋ณผ๋ก ๊ป์ง ํ์
palette='husl'
)
๊ณตํต ๋งค๊ฐ๋ณ์
๋ชจ๋ ์๊ฐํ ํจ์๋ ๋ค์ ๊ณตํต ๋งค๊ฐ๋ณ์๋ฅผ ์ง์ํฉ๋๋ค:
- width: ์บ๋ฒ์ค ๊ฐ๋ก ํฝ์ (๊ธฐ๋ณธ๊ฐ: 1280)
- height: ์บ๋ฒ์ค ์ธ๋ก ํฝ์ (๊ธฐ๋ณธ๊ฐ: 720)
- dpi: ํด์๋ (๊ธฐ๋ณธ๊ฐ: 200)
- palette: ์์ ํ๋ ํธ ('Set1', 'Set2', 'pastel', 'husl', 'coolwarm' ๋ฑ)
- ax: ์ธ๋ถ Axes ๊ฐ์ฒด ์ ๋ฌ ๊ฐ๋ฅ
- callback: Axes ํ์ฒ๋ฆฌ ์ฝ๋ฐฑ ํจ์
์บ๋ฒ์ค ํฌ๊ธฐ ์กฐ์ ์์
# ๊ณ ํด์๋ ํฐ ์ฐจํธ
hs_plot.hs_scatterplot(
df=df,
xname='x',
yname='y',
width=1920,
height=1080,
dpi=300
)
์ธ๋ถ Axes ์ฌ์ฉ ์์
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
hs_plot.hs_boxplot(df=df, xname='cat', yname='val', ax=axes[0, 0])
hs_plot.hs_violinplot(df=df, xname='cat', yname='val', ax=axes[0, 1])
hs_plot.hs_histplot(df=df, xname='val', ax=axes[1, 0])
hs_plot.hs_kdeplot(df=df, xname='val', ax=axes[1, 1])
plt.tight_layout()
plt.show()
์ฝ๋ฐฑ ํจ์ ์ฌ์ฉ ์์
def custom_style(ax):
ax.set_title('์ฌ์ฉ์ ์ ์ ์ ๋ชฉ', fontsize=16, fontweight='bold')
ax.set_xlabel('X์ถ ๋ ์ด๋ธ', fontsize=12)
ax.set_ylabel('Y์ถ ๋ ์ด๋ธ', fontsize=12)
ax.grid(True, alpha=0.3, linestyle='--')
hs_plot.hs_scatterplot(
df=df,
xname='x',
yname='y',
callback=custom_style
)
3. ๋ถ์ ๋ชจ๋ (hossam.analysis)
๋ฐ์ดํฐ ๋ถ์์ ์ํ ํต๊ณ ๊ธฐ๋ฅ๋ค์ ์ ๊ณตํฉ๋๋ค.
from hossam import analysis as hs_analysis
# ๊ธฐ์ ํต๊ณ ๋ถ์
# ํ๊ท ๋ถ์ ํฌํผ
# ๋ถ๋ฅ ์ฑ๋ฅ ํ๊ฐ
# ์๊ณ์ด ๋ถ์
# ๋ฑ๋ฑ (์์ธ ๋ฌธ์ ์ฐธ์กฐ)
4. ์ ์ฒ๋ฆฌ ๋ชจ๋ (hossam.prep)
๋ฐ์ดํฐ ์ ์ฒ๋ฆฌ ๋ฐ ์ ์ ๋ฅผ ์ํ ์ ํธ๋ฆฌํฐ์ ๋๋ค.
from hossam import prep as hs_prep
# ๊ฒฐ์ธก์น ์ฒ๋ฆฌ
# ์ด์์น ํ์ง ๋ฐ ์ ๊ฑฐ
# ์ค์ผ์ผ๋ง ๋ฐ ์ธ์ฝ๋ฉ
# ๋ฑ๋ฑ (์์ธ ๋ฌธ์ ์ฐธ์กฐ)
5. ์ ํธ๋ฆฌํฐ ๋ชจ๋ (hossam.util)
๊ธฐํ ํธ์ ๊ธฐ๋ฅ๋ค์ ์ ๊ณตํฉ๋๋ค.
from hossam import util as hs_util
# ๋ค์ํ ํฌํผ ํจ์๋ค
# ๋ฐ์ดํฐ ๋ณํ
# ํ์ผ I/O ์ง์
# ๋ฑ๋ฑ (์์ธ ๋ฌธ์ ์ฐธ์กฐ)
๐ ์์กด์ฑ
Hossam์ ๋ค์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ค์ ์ฌ์ฉํฉ๋๋ค:
ํต์ฌ ์์กด์ฑ
- pandas: ๋ฐ์ดํฐ ์ฒ๋ฆฌ ๋ฐ ๋ถ์
- numpy: ์์น ๊ณ์ฐ
- matplotlib: ๊ธฐ๋ณธ ์๊ฐํ
- seaborn: ํต๊ณ ์๊ฐํ
ํต๊ณ ๋ฐ ๋จธ์ ๋ฌ๋
- scipy: ๊ณผํ ๊ณ์ฐ ๋ฐ ํต๊ณ
- scikit-learn: ๋จธ์ ๋ฌ๋ ์๊ณ ๋ฆฌ์ฆ
- statsmodels: ํต๊ณ ๋ชจ๋ธ๋ง
- pingouin: ํต๊ณ ๋ถ์
๊ธฐํ
- tqdm: ์งํ๋ฅ ํ์
- tabulate: ํ ํ์ ์ถ๋ ฅ
- requests: HTTP ์์ฒญ
- openpyxl, xlrd: Excel ํ์ผ ์ง์
- statannotations: ํต๊ณ ์ฃผ์
- joblib: ์ง๋ ฌํ ๋ฐ ๋ณ๋ ฌ ์ฒ๋ฆฌ
๋ชจ๋ ์์กด์ฑ์ pip install hossam ์ ์๋์ผ๋ก ์ค์น๋ฉ๋๋ค.
๐ ์ฌ์ฉ ์ฌ๋ก
๊ต์ก์ฉ
# ์์
์์ ๋น ๋ฅด๊ฒ ์๊ฐํ ์์ฐ
from hossam import load_data, plot as hs_plot
df = load_data('SAMPLE_DATA')
hs_plot.hs_pairplot(df=df, hue='target', palette='Set1')
๋ฐ์ดํฐ ํ์
# ๋น ๋ฅธ EDA (ํ์์ ๋ฐ์ดํฐ ๋ถ์)
from hossam import plot as hs_plot
# ๋ถํฌ ํ์ธ
hs_plot.hs_distribution_by_class(
data=df,
hue='target',
type='histkde'
)
# ์๊ด๊ด๊ณ ํ์ธ
hs_plot.hs_heatmap(data=df.corr(), palette='coolwarm')
# ํน์ง ๊ด๊ณ ํ์ธ
hs_plot.hs_scatter_by_class(
data=df,
hue='target',
outline=True
)
๋ชจ๋ธ ํ๊ฐ
from sklearn.linear_model import LinearRegression
from hossam import plot as hs_plot
# ๋ชจ๋ธ ํ์ต
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# ์์ฐจ ๋ถ์
hs_plot.hs_residplot(y=y_test, y_pred=y_pred, lowess=True, mse=True)
# ์ ๊ท์ฑ ๊ฒ์ฆ
hs_plot.hs_qqplot(y_pred=y_test - y_pred)
๐ ๋ผ์ด์ ์ค
์ด ํ๋ก์ ํธ๋ MIT ๋ผ์ด์ ์ค ํ์ ๋ฐฐํฌ๋ฉ๋๋ค.
์์ธํ ๋ด์ฉ์ LICENSE ํ์ผ์ ์ฐธ์กฐํ์ธ์.
๐จโ๐ซ ์ ์
์ด๊ดํธ (Lee Kwang-Ho)
- ์์ดํฐ์(ITWILL) ๊ฐ์ฌ
- ๋จธ์ ๋ฌ๋ ๋ฐ ๋ฐ์ดํฐ ๋ถ์ ๊ต์ก ์ ๋ฌธ
- Email: leekh4232@gmail.com
- Blog: https://blog.hossam.kr/
- GitHub: https://github.com/leekh4232
- Youtube: https://www.youtube.com/@hossam-codingclub
๐ ๊ฐ์ฌ์ ๋ง
์ด ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ ์์ดํฐ์์์ ์งํ๋๋ ๋ฐ์ดํฐ ๋ถ์ ๊ต์ก์ ์ํด ๊ฐ๋ฐ๋์์ต๋๋ค.
์๊ฐ์ ์ฌ๋ฌ๋ถ์ ํ์ต์ ๋์์ด ๋๊ธฐ๋ฅผ ๋ฐ๋๋๋ค.
๐ ์ง์ ๋ฐ ๋ฌธ์
- ์ด์ ๋ฆฌํฌํธ: GitHub Issues
- ์ด๋ฉ์ผ: leekh4232@gmail.com
Happy Data Analysis! ๐โจ
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hossam-0.3.0.tar.gz.
File metadata
- Download URL: hossam-0.3.0.tar.gz
- Upload date:
- Size: 26.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5260eac1a78e6c743352683b78784521b7f5ae1a8fee6847a0a710b0f4786d9c
|
|
| MD5 |
17872dce5214df61144f5ac363d51f87
|
|
| BLAKE2b-256 |
51e58c574d1b7bf975d80d59131b6253d86a092388a3dcebf9776b9f4c65d70a
|
File details
Details for the file hossam-0.3.0-py3-none-any.whl.
File metadata
- Download URL: hossam-0.3.0-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4626c1b57f1ffa19a3d88983d2a5718b0d17c61684440f415d16b577a26f4663
|
|
| MD5 |
cb45236f7fa02d1df6ffcd25f4e7d9d5
|
|
| BLAKE2b-256 |
49d365a27fdd11bab60930dca53ab1d0c0573eef11261a2cd00696591185c584
|