process data as stream.
Project description
利用生成器、协程等流式处理数据。
对数据按多个函数的组合进行过滤、去重,对数据列进行转换,对数据进行计算。
对数据进行质量控制(QC)检查,确保最后的数据与预期相符。
将数据写入csv等。
原理和特点说明
编程主要要到了生成器,各个类用for循环从上游抽取数据,用yield给下游提供数据。通过改写or规则,利用|操作符将各个组建组合起来。
同时支持协程,构建广播、路由等数据节点。
- 特点:
高拓展性。
低内存占用。
近依赖python基础包。
参考项目
整体思路主要参考了这个项目:https://github.com/sandabuliu/python-stream。
安装
>>> pip install file-stream
使用
从CSV文件读取数据,按条件筛选后输出到屏幕。
reader = CsvReader('/home/hetao/Data/p5w/数据分析/IPO_RoadShow.txt', delimiter='\t', encoding='gbk') fit = Filter(lambda x: True) writer = ScreenOutput(end='\r') p = reader | fit | writer p.output()
更多范例参见test文件夹。
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
file_stream-0.2.4.tar.gz
(9.7 kB
view details)
Built Distribution
File details
Details for the file file_stream-0.2.4.tar.gz
.
File metadata
- Download URL: file_stream-0.2.4.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/36.5.0.post20170921 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92aaec15786a4bd5517e78130b4e6bc1ab8353b971d95fb32d91e49a074f2d67 |
|
MD5 | 60c49170acb4f9fb99786ffcc4959aa2 |
|
BLAKE2b-256 | 545751a6ee27f14db59e25d51090508dd92bea2af3db483097ec45155343bf55 |
File details
Details for the file file_stream-0.2.4-py3-none-any.whl
.
File metadata
- Download URL: file_stream-0.2.4-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/36.5.0.post20170921 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd021141c61f63790b6bd0f7c98f1bc1c1bfedfb81bf7c11daea608b9b8e74c1 |
|
MD5 | 4ab71c67335c658f46bfedbfe118b011 |
|
BLAKE2b-256 | d465d8e3acd29808ff032cc62fe6bf3b49ffa906fba00b30744bd059c58913e7 |