Skip to main content

A comprehensive machine learning toolkit for data analysis, preprocessing, modeling, and evaluation.

Project description

FreeAeon-ML

FreeAeon-ML 是一个一站式的 Python 机器学习工具包,封装了常用的机器学习流程模块,包括数据探索分析、数据预处理、特征选择、模型训练(分类、回归、聚类、时间序列)、模型评估和可视化,旨在帮助研究者和工程师高效构建、训练和评估机器学习模型。更多


🚀 特性功能

  • 📊 数据探索与统计分析:正态性检验、分布拟合、相关性分析等
  • 🧹 数据预处理:标准化、异常值处理、Box-Cox 变换、分箱等
  • 🔍 特征选择:信息图谱、方差分析、PCA 降维、Granger 因果检验等
  • 🧠 模型训练支持
    • 分类模型:DT, RF, SVM, ANN, GLM, Naive Bayes, GBM, XGBoosting,...
    • 回归模型:RF, ANN, GLM, GBM, XGBoosting,...
    • 聚类模型:GaussianMixture,KMeans,AffinityPropagation,AgglomerativeClustering,Birch,MeanShift,OPTICS,...
    • 时间序列模型:ARIMA分解与预测等
  • 📈 模型评估:评估指标自动输出、特征重要性排序、ROC等曲线绘制
  • 💾 模型保存与加载
  • 🧬 样本均衡与增强:SMOTE平衡采样、经典采样、自动切分等
  • 📊 可视化支持:热力图、等高线、桑基图、序列图等
  • ⚙️ H2O 引擎集成:支持GPU,支持分布式,支持多客户端并发等

🧪 更多


📦 安装方式

pip install FreeAeon-ML

✅ 环境依赖

  • Python >= 3.8
  • Java Runtime Environment (JRE) 8+
  • 主要依赖库:
    • numpy, pandas, matplotlib, seaborn
    • scipy, scikit-learn, statsmodels
    • h2o

📌 注意:必须安装 Java 环境! FreeAeon-ML 使用 H2O 平台进行部分模型训练,需确保系统已安装 Java:

java -version

若未安装,请参考以下方式:


🧪 快速示例

import numpy as np
import pandas as pd
from FreeAeonML.FADataPreprocess import CFADataPreprocess
from FreeAeonML.FASample import CFASample
from FreeAeonML.FAModelClassify import CFAModelClassify
from h2o.estimators import H2ORandomForestEstimator
import h2o

#初始化,如果是WSL,注释掉h2o.init(),使用h2o.connect()
h2o.init(nthreads=-1,verbose=False)
#h2o.connect(ip=ip,port=port)

# 随机生成样本(有5个特征,2个分类,分类标签字段为"y")
df_sample = CFASample.get_random_classification(1000, n_feature=5, n_class=2)
print(df_sample)

# 划分为训练集和测试集(默认80%为训练样本,20%为测试样本)
df_train, df_test = CFASample.split_dataset(df_sample)

# 使用自带的一组模型进行训练
model = CFAModelClassify(models=None)

# 如需要指定的模型进行训练,请按照以下格式指定模型
#model = CFAModelClassify(models={"rf": H2ORandomForestEstimator()})

# 训练模型(df_train为训练样本,其中y字段为标签字段)。
model.train(df_train, y_column="y")

# 使用模型进行预测(df_test为测试样本,其中y字段为标签字段)。
df_pred = model.predict(df_test, y_column="y")
print(df_pred)

# 统计模型的各项性能指标
df_eval = model.evaluate(df_test, y_column="y")
print(df_eval)

📚 使用文档

完整使用说明、详细参数介绍及进阶示例请参考:


📁 模块说明

模块名 描述
FADataEDA 探索性数据分析
FADataPreprocess 数据预处理(标准化、异常值检测等)
FAFeatureSelect 特征选择(信息图、PCA、因果性检验等)
FAModelClassify 分类模型训练封装
FAModelRegression 回归模型训练封装
FAModelCluster 聚类模型训练封装
FAModelSeries 时间序列建模(自动ARIMA)
FAEvaluation 模型评估与指标输出
FAVisualize 可视化模块(热图、桑基图、等高线等)
FASample 样本生成与增强

🧪 测试脚本示例

测试脚本位于 tests/ 目录,支持以下演示:

  • demo_Sample.py:样本生成与增强测试
  • demo_DataEDA.py:数据分析演示
  • demo_DataPreprocess.py:预处理功能测试
  • demo_FeatureSelect.py:特征选择测试
  • demo_ModelClassify.py:分类模型演示
  • demo_ModelRegression.py:回归模型演示
  • demo_ModelCluster.py:聚类模型演示
  • demo_ModelSeries.py:时间序列建模演示
  • demo_Evaluation.py:模型性能评估
  • demo_Visualize.py:图形可视化测试

运行示例:

  • demo_Sample.py:样本生成与增强测试

    python tests/demo_Sample.py
    
  • demo_DataEDA.py:数据分析演示

    python tests/demo_DataEDA.py
    
  • demo_DataPreprocess.py:预处理功能测试

    python tests/demo_DataPreprocess.py
    
  • demo_FeatureSelect.py:特征选择测试

    python tests/demo_FeatureSelect.py
    
  • demo_ModelClassify.py:分类模型演示

    python tests/demo_ModelClassify.py
    
  • demo_ModelRegression.py:回归模型演示

    python tests/demo_ModelRegression.py
    
  • demo_ModelCluster.py:聚类模型演示

    python tests/demo_ModelCluster.py
    
  • demo_ModelSeries.py:时间序列建模演示

    python tests/demo_ModelSeries.py
    
  • demo_Evaluation.py:模型性能评估

    python tests/demo_Evaluation.py
    
  • demo_Visualize.py:图形可视化测试

    python tests/demo_Visualize.py
    

📄在Window的WSL运行

WSL 下推荐单节点模式(-flatfile /dev/null -nthreads 2),避免网络多节点探测失败

1️⃣ 手工运行h2o服务

java -jar ./site-packages/h2o/backend/bin/h2o.jar -ip 127.0.0.1 -port 54321 -flatfile /dev/null -nthreads 2

(假设h2o.jar文件在目录中./site-packages/h2o/backend/bin/)

2️⃣ 修改demo代码中的连接方式

修改代码,将h2o.init(nthreads=-1,verbose=False) 改成h2o.connect(ip="127.0.0.1",port=54321)

h2o.init(nthreads=-1,verbose=False) --> h2o.connect(ip="127.0.0.1",port=54321)


📄 License

FreeAeon-ML is released under the MIT License.
© 2025 FreeAeon Contributors


🤝 欢迎贡献

欢迎 PR、Issue 与建议!请确保代码规范、清晰,附带测试。


✍️ Author

Jim Xie
📧 E-Mail: jim.xie.cn@outlook.com, xiewenwei@sina.com
🔗 GitHub: https://github.com/jim-xie-cn/FreeAeon-ML

Yin Jie

📧 E-Mail: yinjiejspi@163.com

Cindy Ma

📧 E-Mail: 453303661@qq.com

Wenjing Zhang

📧 E-Mail: 634676988@qq.com

Danny Zhang

📧 E-Mail: zhyzxsw@126.com


🧠 Citation

If you use this project in academic work, please cite it as:

Jim Xie, FreeAeon-ML: A comprehensive machine learning toolkit for data analysis, preprocessing, modeling, and evaluation., 2025.
GitHub Repository: https://github.com/jim-xie-cn/FreeAeon-ML

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freeaeon_ml-0.3.5.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

freeaeon_ml-0.3.5-py3-none-any.whl (35.9 kB view details)

Uploaded Python 3

File details

Details for the file freeaeon_ml-0.3.5.tar.gz.

File metadata

  • Download URL: freeaeon_ml-0.3.5.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.18

File hashes

Hashes for freeaeon_ml-0.3.5.tar.gz
Algorithm Hash digest
SHA256 f6217b9066c1caeab13552f1d62966717a1e017f2bb2a423c3d6813c5d96d65c
MD5 a4c6ea8f03eb1c24b21af63415c8c9fa
BLAKE2b-256 a22611bbfc87839d12a4a93ed59dbe33653b5157955f180eb6ada6899c92d426

See more details on using hashes here.

File details

Details for the file freeaeon_ml-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: freeaeon_ml-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 35.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.18

File hashes

Hashes for freeaeon_ml-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2373fbcd1582584311e10f89182df20dd68cb6a24c85bf042d8934232e2b47b6
MD5 363fe6b24df245b1b75c7d1a8f7ae7e0
BLAKE2b-256 43915c1b91ce0f04d40b136693ca179201cf4a15c2e7a92587ba90b0f937ba0f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page