Skip to main content

Atom is a feature engineering tool.

Project description

Atom-特征库工具

shields_version shields_license shields_author shiedls_python

atomsymbol

介绍

  • Atom是一种特征管理工具,以数据和算子作为基本概念,数据为基础数据用于训练特征和构建特征;算子为基于固定一个或多个数据集进行新特征生产的流程,可以是一个简单直接计算函数,也可以是一个复杂的算法模型,还可以是算法模型和直接计算相结合的组合体。atom的特色是对由数据衍生的算子进行了数据关联、统一管理,并直接提供了服务功能,使得每个算子可以直接实现在线实时计算特征,为主体算法模型服务,提高模型精度。

安装

  • Atom采用Python开发,得益于Python良好的社区环境,安装支持Pythonic风格的各种管理器。
$ pip install atom-0.1-xxxxxxxxxxxx.whl

快速指南

  • 首先使用atomctl命令行工具进行工作空间设置和初始化操作。然后分别启动元数据服务和atom主服务(两个服务未支持后台开启)。

  • 以下是atomctl命令行示例:

	$ atomctl set --workspace 'D:\Workspace\JiYuan\Atom\Demo\test'

	$ atomctl init

	$ atomctl metadata-service 

	$ atomctl start-service 
  • 然后就是使用python脚本进行atom的数据和算子操作。主要包括数据和算子的注册、查询、删除三个基本操作以及算子的加载操作

  • 以下是atom主程脚本代码示例:

	from atom.scheduler import *

	### 加载Atom调度器
	atom = AtomScheduler(mode='delay')


	### register-data测试
	df = pd.read_csv(r"D:\Workspace\JiYuan\WindPowerForecast\LSTM\demo\merge_data_GDTYUAN_ec.csv")
	atom.data_register(tag='test',
	                   belong='first',
	                   object_name='merge_data_GDTYUAN_ec_1',
	                   data_object=df,
	                   remarks='this is a test data!')

	### register-operator测试
	### 即时模式----装饰器方式
	# @atom.operator_register(tag='test',
	#                         belong='first',
	#                         object_name='test_function_a',
	#                         remarks='this is a test operator!')
	def test_function(a,b):
	    c = a + b * 2 + 1
	    return c
	# tmp_a = test_function(1,2)
	### 及时模式----函数方式
	# tmp_func = atom.operator_register(tag='test',
	#                         belong='first',
	#                         object_name='test_function_b',
	#                         remarks='this is a test operator!')(test_function)
	# tmp_b = tmp_func(3,4) 
	### 延时模式
	atom.operator_register(tag='test',
	                       belong='first',
	                       object_name='test_function_a', ## cc
	                       operator_object=test_function,
	                       remarks='this is a test operator!')


	# ### data-remove测试
	# atom.data_remove(tag='test',object_name='merge_data_GDTYUAN_ec_00')


	### operator-remove测试
	# atom.operator_remove(tag='test',object_name='test_function_cc')
	 
	                       
	### data-query测试
	data_view_df = atom.data_query(tag='test')
	print(data_view_df)


	### operator-query测试
	operator_view_df = atom.operator_query(tag='test')
	print(operator_view_df)


	### data-modify测试
	atom.data_modify()


	### operator-modify测试
	atom.operator_modify()


	### data-load测试
	data_load_df = atom.data_load(tag='test',object_name='merge_data_GDTYUAN_ec_1')
	print(data_load_df)


	### operator-load测试
	operator_load_a = atom.operator_load(tag='test',object_name='test_function_a')
	print(operator_load_a(10,20))
	# print(test_function(**{'a':10,'b':20})) ### 字典参数传递
  • 最后是算子在线计算服务的使用。当一个算子注册到atom后,他就自动获得了在线计算服务的功能。

  • 表单数据格式如下:

	### 该表单数据仅以python为例展示
	post_form = {
    "tag": "test",
    "object_name": "test_function_a",
	"data_json": {"a":78,"b":9}
	}

设计

  • WEBUI webui
  • DATAUI dataui
  • 使用工厂模式,解耦计算、存储和通信所使用的第三方工具
  • 设计数据和算子两个基本概念,扩展了特征工程工具的适用范围,一个算子不仅可以直接计算,还可以是复杂算法模型,覆盖特征工程的特征挖掘和关键指标计算管理。
  • 算子一次注册即拥有长效计算服务功能。
  • 算子基于不同数据实现了版本管理,更具有实际意义
  • 技术列表
    • 工厂模式
    • MinIO
    • Bootstrap5
    • SQLite3
    • RabbitMQ
    • FastAPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shihua-atom-0.1.1.tar.gz (9.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shihua_atom-0.1.1-py3-none-any.whl (9.3 MB view details)

Uploaded Python 3

File details

Details for the file shihua-atom-0.1.1.tar.gz.

File metadata

  • Download URL: shihua-atom-0.1.1.tar.gz
  • Upload date:
  • Size: 9.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for shihua-atom-0.1.1.tar.gz
Algorithm Hash digest
SHA256 047393db21f4b34f24767f623280a67ea2fb5644992d413ccaa015802ebf5cc1
MD5 e7d1847543c752a71e77c2710bbec2f1
BLAKE2b-256 24569798f6bc1479b13bde1e87386efe232fc281b27db05902285d16a27a236a

See more details on using hashes here.

File details

Details for the file shihua_atom-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: shihua_atom-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for shihua_atom-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d2670d160d70075a178f2a3b15abbe60b8f0576b678721d5f472ff7386a3bee5
MD5 624604addd0c1c4f2812653e8fda847f
BLAKE2b-256 66b90cb728bd3567d07547994041efd2397b1f99223b9977576684d2c1a12613

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page