Skip to main content

process data as stream.

Project description

利用生成器、协程等流式处理数据。

对数据按多个函数的组合进行过滤、去重,对数据列进行转换,对数据进行计算。

对数据进行质量控制(QC)检查,确保最后的数据与预期相符。

将数据写入csv等。

原理和特点说明

编程主要要到了生成器,各个类用for循环从上游抽取数据,用yield给下游提供数据。通过改写or规则,利用|操作符将各个组建组合起来。

同时支持协程,构建广播、路由等数据节点。

特点:
  • 高拓展性。

  • 低内存占用。

  • 近依赖python基础包。

参考项目

整体思路主要参考了这个项目:https://github.com/sandabuliu/python-stream

安装

>>> pip install file-stream

使用

从CSV文件读取数据,按条件筛选后输出到屏幕。

reader = CsvReader('/home/hetao/Data/p5w/数据分析/IPO_RoadShow.txt', delimiter='\t', encoding='gbk')
fit = Filter(lambda x: True)
writer = ScreenOutput(end='\r')
p = reader | fit | writer
p.output()

更多范例参见test文件夹。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_stream-0.2.4.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

file_stream-0.2.4-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file file_stream-0.2.4.tar.gz.

File metadata

  • Download URL: file_stream-0.2.4.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/36.5.0.post20170921 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.3

File hashes

Hashes for file_stream-0.2.4.tar.gz
Algorithm Hash digest
SHA256 92aaec15786a4bd5517e78130b4e6bc1ab8353b971d95fb32d91e49a074f2d67
MD5 60c49170acb4f9fb99786ffcc4959aa2
BLAKE2b-256 545751a6ee27f14db59e25d51090508dd92bea2af3db483097ec45155343bf55

See more details on using hashes here.

File details

Details for the file file_stream-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: file_stream-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/36.5.0.post20170921 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.3

File hashes

Hashes for file_stream-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bd021141c61f63790b6bd0f7c98f1bc1c1bfedfb81bf7c11daea608b9b8e74c1
MD5 4ab71c67335c658f46bfedbfe118b011
BLAKE2b-256 d465d8e3acd29808ff032cc62fe6bf3b49ffa906fba00b30744bd059c58913e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page